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Preface 


It was our privilege to serve as the program chairs for CAV 2023, the 35th International 
Conference on Computer-Aided Verification. CAV 2023 was held during July 19-22, 
2023 and the pre-conference workshops were held during July 17—18, 2023. CAV 2023 
was an in-person event, in Paris, France. 

CAV is an annual conference dedicated to the advancement of the theory and practice 
of computer-aided formal analysis methods for hardware and software systems. The 
primary focus of CAV is to extend the frontiers of verification techniques by expanding 
to new domains such as security, quantum computing, and machine learning. This puts 
CAV at the cutting edge of formal methods research, and this year’s program is areflection 
of this commitment. 

CAV 2023 received a large number of submissions (261). We accepted 15 tool 
papers, 3 case-study papers, and 49 regular papers, which amounts to an acceptance 
rate of roughly 26%. The accepted papers cover a wide spectrum of topics, from theo- 
retical results to applications of formal methods. These papers apply or extend formal 
methods to a wide range of domains such as concurrency, machine learning and neu- 
ral networks, quantum systems, as well as hybrid and stochastic systems. The program 
featured keynote talks by Ruzica Piskac (Yale University), Sumit Gulwani (Microsoft), 
and Caroline Trippel (Stanford University). In addition to the contributed talks, CAV 
also hosted the CAV Award ceremony, and a report from the Synthesis Competition 
(SYNTCOMP) chairs. 

In addition to the main conference, CAV 2023 hosted the following workshops: Meet- 
ing on String Constraints and Applications (MOSCA), Verification Witnesses and Their 
Validation (VeWit), Verification of Probabilistic Programs (VeriProP), Open Problems 
in Learning and Verification of Neural Networks (WOLVERINE), Deep Learning-aided 
Verification (DAV), Hyperproperties: Advances in Theory and Practice (HYPER), Syn- 
thesis (SYNT), Formal Methods for ML-Enabled Autonomous Systems (FoOMLAS), and 
Verification Mentoring Workshop (VMW). CAV 2023 also hosted a workshop dedicated 
to Thomas A. Henzinger for this 60th birthday. 

Organizing a flagship conference like CAV requires a great deal of effort from the 
community. The Program Committee for CAV 2023 consisted of 76 members—a com- 
mittee of this size ensures that each member has to review only a reasonable number of 
papers in the allotted time. In all, the committee members wrote over 730 reviews while 
investing significant effort to maintain and ensure the high quality of the conference pro- 
gram. We are grateful to the CAV 2023 Program Committee for their outstanding efforts 
in evaluating the submissions and making sure that each paper got a fair chance. Like 
recent years in CAV, we made artifact evaluation mandatory for tool paper submissions, 
but optional for the rest of the accepted papers. This year we received 48 artifact submis- 
sions, out of which 47 submissions received at least one badge. The Artifact Evaluation 
Committee consisted of 119 members who put in significant effort to evaluate each arti- 
fact. The goal of this process was to provide constructive feedback to tool developers and 
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help make the research published in CAV more reproducible. We are also very grateful 
to the Artifact Evaluation Committee for their hard work and dedication in evaluating 
the submitted artifacts. 

CAV 2023 would not have been possible without the tremendous help we received 
from several individuals, and we would like to thank everyone who helped make CAV 
2023 a success. We would like to thank Alessandro Cimatti, Isil Dillig, Javier Esparza, 
Azadeh Farzan, Joost-Pieter Katoen and Corina Pasareanu for serving as area chairs. 
We also thank Bernhard Krag] and Daniel Dietsch for chairing the Artifact Evaluation 
Committee. We also thank Mohamed Faouzi Atig for chairing the workshop organization 
as well as leading publicity efforts, Eric Koskinen as the fellowship chair, Sebastian 
Bardin and Ruzica Piskac as sponsorship chairs, and Srinidhi Nagendra as the website 
chair. Srinidhi, along with Enrique Roman Calvo, helped prepare the proceedings. We 
also thank Ankush Desai, Eric Koskinen, Burcu Kulahcioglu Ozkan, Marijana Lazic, and 
Matteo Sammartino for chairing the mentoring workshop. Last but not least, we would 
like to thank the members of the CAV Steering Committee (Kenneth McMillan, Aarti 
Gupta, Orna Grumberg, and Daniel Kroening) for helping us with several important 
aspects of organizing CAV 2023. 

We hope that you will find the proceedings of CAV 2023 scientifically interesting 
and thought-provoking! 
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Privacy-Preserving Automated Reasoning 


Ruzica Piskac 
Yale University, USA 


Formal methods offer a vast collection of techniques to analyze and ensure the correctness 
of software and hardware systems against a given specification. In fact, modern formal 
methods tools scale to industrial applications. Despite this significant success, privacy 
requirements are not considered in the design of these tools. For example, when using 
automated reasoning tools, the implicit requirement is that the formula to be proved is 
public. This raises an issue if the formula itself reveals information that is supposed to 
remain private to one party. To overcome this issue, we propose the concept of privacy- 
preserving automated reasoning. 

We first consider the problem of privacy-preserving Boolean satisfiability [1]. In this 
problem, two mutually distrustful parties each provides a Boolean formula. The goal 
is to decide whether their conjunction is satisfiable without revealing either formula 
to the other party. We present an algorithm to solve this problem. Our algorithm is an 
oblivious variant of the classic DPLL algorithm and can be integrated with existing 
secure two-party computation techniques. 

We next turn to the problem where one party wants to prove to another party that 
their program satisfies a given specification without revealing the program. We split this 
problem into two subproblems: (1) proving that the program can be translated into a 
propositional formula without revealing either the program or the formula; (2) prov- 
ing that the obtained formula entails the specification. To solve the latter subproblem, 
we developed a zero-knowledge protocol for proving the unsatisfiability of formulas 
in propositional logic [2] (ZKUNSAT). Our protocol is based on a resolution proof of 
unsatisfiability. We encode verification of the resolution proof using polynomial equiv- 
alence checking, which enables us to use fast zero-knowledge protocols for polynomial 
satisfiability. 

Finally, we will outline future directions towards extending ZKUNSAT to first-order 
logic modulto theories (SMT) and translating programs to formulas in zero-knowledge 
to realize fully automated privacy-preserving program verification. 
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Enhancing Programming Experiences Using AI: 
Leveraging LLMs as Analogical Reasoning Engines 
and Beyond 


Sumit Gulwani 


Microsoft, USA 


AI can significantly improve programming experiences for a diverse range of users: from pro- 
fessional developers and data scientists (proficient programmers) who need help in software 
engineering and data wrangling, to spreadsheet users (low-code programmers) needing help in 
authoring formulas, and students (novice programmers) seeking hints when tackling program- 
ming homework. To effectively communicate their needs to AI, users can express their intent 
explicitly through input-output examples or natural language specifications, or implicitly by 
presenting bugs or recent code edits for AI to analyze and suggest improvements. 

Analogical reasoning is at the heart of problem solving as it allows to make sense of new 
information and transfer knowledge from one domain to another. In this talk, I will demonstrate 
that analogical reasoning is a fundamental emergent capability of Large Language Models 
(LLMs) and can be utilized to enhance various types of programming experiences. 

However, there is significant room for innovation in building robust experiences tailored 
to specific task domains. I will discuss how various methods from symbolic AI (particularly 
programming-by-examples-or-analogies) such as search-and-rank, failure-guided refinement, 
and neuro-symbolic cooperation, can help fill this gap. This comes in three forms: (a) Prompt 
engineering that involves synthesizing specification-rich, context-aware prompts from vari- 
ous sources, sometimes using the LLM itself, to elicit optimal output. (b) Post-processing 
techniques that guide, rank, and validate the LLM’s output, occasionally employing the LLM 
for these purposes. (c) Multi-turn workflows that involve multiple LLM invocations, allowing 
the model more time and iterations to optimize results. I will illustrate these concepts using 
various capabilities in Excel, PowerQuery, and Visual Studio. 


Verified Software Security Down to Gates 


Caroline Trippel 


Stanford University, USA 


Hardware-software (HW-SW) contracts are critical for high-assurance computer systems 
design and an enabler for software design/analysis tools that find and repair hardware-related 
bugs in programs. E.g., memory consistency models define what values shared memory loads 
can return in a parallel program. Emerging security contracts define what program data is sus- 
ceptible to leakage via hardware side-channels and what speculative control- and data-flow 
is possible at runtime. However, these contracts and the analyses they support are useless if 
we cannot guarantee microarchitectural compliance, which is a “grand challenge.” Notably, 
some contracts are still evolving (e.g., security contracts), making hardware compliance a 
moving target. Even for mature contracts, comprehensively verifying that a complex microar- 
chitecture implements some abstract contract is a time-consuming endeavor involving teams 
of engineers, which typically requires resorting to incomplete proofs. 

Our work takes a radically different approach to the challenge above by synthesizing HW- 
SW contracts from advanced (i.e., industry-scale/complexity) processor implementations. In 
this talk, I will present our work on: synthesizing security contracts from processor specifi- 
cations written in Verilog; designing compiler approaches parameterized by these contracts 
that can find and repair hardware-related vulnerabilities in programs; and updating hardware 
microarchitectures to support scalable verification and efficient security-hardened programs. 
I will conclude by outlining remaining challenges in attaining the vision of verified software 
security down to gates. 
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Abstract. We present an algorithm to learn a deterministic timed 
automaton (DTA) via membership and equivalence queries. Our algo- 
rithm is an extension of the L* algorithm with a Myhill-Nerode style 
characterization of recognizable timed languages, which is the class of 
timed languages recognizable by DTAs. We first characterize the recog- 
nizable timed languages with a Nerode-style congruence. Using it, we 
give an algorithm with a smart teacher answering symbolic membership 
queries in addition to membership and equivalence queries. With a sym- 
bolic membership query, one can ask the membership of a certain set of 
timed words at one time. We prove that for any recognizable timed lan- 
guage, our learning algorithm returns a DTA recognizing it. We show how 
to answer a symbolic membership query with finitely many membership 
queries. We also show that our learning algorithm requires a polyno- 
mial number of queries with a smart teacher and an exponential number 
of queries with a normal teacher. We applied our algorithm to various 
benchmarks and confirmed its effectiveness with a normal teacher. 


Keywords: timed automata - active automata learning - recognizable 
timed languages - L* algorithm - observation table 


1 Introduction 


Active automata learning is a class of methods to infer an automaton recogniz- 
ing an unknown target language Ligt C &* through finitely many queries to a 
teacher. The L* algorithm [8], the best-known active DFA learning algorithm, 
infers the minimum DFA recognizing Ligt using membership and equivalence 
queries. In a membership query, the learner asks if a word w € X* is in the 
target language List, which is used to obtain enough information to construct 
a hypothesis DFA Any p. Using an equivalence query, the learner checks if the 
hypothesis Anyp recognizes the target language Ligt. If L(Anyp) A Ltgt, the 
teacher returns a counterexample cer E€ LigtAL(Anyp) differentiating the target 
language and the current hypothesis. The learner uses cez to update Apyp to 
classify cex correctly. Such a learning algorithm has been combined with formal 
verification, e. g., for testing [22,24,26,28] and controller synthesis [31]. 


© The Author(s) 2023 
C. Enea and A. Lal (Eds.): CAV 2023, LNCS 13964, pp. 3-26, 2023. 
https: //doi.org/10.1007/978-3-031-37706-8_1 
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aye<l agi 


a,c > 1/c:=0 


: : aea 
(b) Intermediate observation tables for vesi 


learning A. a and aa are deemed equiv- (c) A DTA A’ with one 
(a) A DFA A alent with extensions S = {£,a} but clock variable c 
distinguished with S = {£,a, b}. 


Tl oF {alr = 0} {mel mE (0, 1)} 
To | To = 
{ro | To € (0,1)} T To +74 € (0,1) 


{toati | To € (0,1), Tı € (0,1), 70 + 71 E (0, 1)}(= pr) To +71 + 74 € (0,1) 


{T0aT1aT2 | To E (1,2), 71 E (0,1), T2 € (0,1), 71 +72 E (0,1) } (= p2) Tı + T2 + T6 € (0,1) 


(d) Timed observation table for learning A’. Each cell is indexed by a pair (p, s) € Px S 
of elementary languages. The cell indexed by (p, s) shows a constraint A such that 
w € p: s satisfies w E€ Ltgt if and only if A holds. Elementary languages pı and pz are 
deemed equivalent with the equation rà + Ti = Tf +73, where Ti represents T; in pj. 


Fig. 1. Illustration of observation tables in the L* algorithm for DFA learning (Fig. 1b) 
and our algorithm for DTA learning (Fig. 1d) 


Most of the DFA learning algorithms rely on the characterization of regular 
languages by Nerode’s congruence. For a language £, words p and p’ are equiva- 
lent if for any extension s, p-s € £ if and only if p' -s € £. It is well known that if 
L£ is regular, such an equivalence relation has finite classes, corresponding to the 
locations of the minimum DFA recognizing £ (known as Myhill-Nerode theorem; 
see, e. g., [18]). Moreover, for any regular language £, there are finite extensions 
S such that p and p’ are equivalent if and only if for any s € S, p- s € L£ if and 
only if p' -s € L. Therefore, one can learn the minimum DFA by learning such 
finite extensions S and the finite classes induced by Nerode’s congruence. 

The L* algorithm learns the minimum DFA recognizing the target language 
Ligt using a 2-dimensional array called an observation table. Figure 1b illustrates 
observation tables. The rows and columns of an observation table are indexed 
with finite sets of words P and S, respectively. Each cell indexed by (p,s) € 
P x S shows if p: s € Ligt. The column indices S are the current extensions 
approximating Nerode’s congruence. The L* algorithm increases P and S$ until: 
1) the equivalence relation defined by S' converges to Nerode’s congruence and 
2) P covers all the classes induced by the congruence. The equivalence between 
p,p’ € P under S can be checked by comparing the rows in the observation 
table indexed with p and p’. For example, Fig. 1b shows that a and aa are 
deemed equivalent with extensions S = {<,a} but distinguished by adding b to 
S. The refinement of P and S is driven by certain conditions to validate the DFA 
construction and by addressing the counterexample obtained by an equivalence 


query. 
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Timed words are extensions of conventional words with real-valued dwell time 
between events. Timed languages, sets of timed words, are widely used to formal- 
ize real-time systems and their properties, e. g., for formal verification. Among 
various formalisms representing timed languages, timed automata (TAs) [4] is 
one of the widely used formalisms. A TA is an extension of an NFA with finitely 
many clock variables to represent timing constraints. Figure 1c shows an exam- 
ple. 

Despite its practical relevance, learning algorithms for TAs are only available 
for limited subclasses of TAs, e.g., real-time automata [6,7], event-recording 
automata [15,16], event-recording automata with unobservable reset [17], and 
one-clock deterministic TAs [5,30]. Timing constraints representable by these 
classes are limited, e.g., by restricting the number of clock variables or by 
restricting the edges where a clock variable can be reset. Such restriction sim- 
plifies the inference of timing constraints in learning algorithms. 


Contributions. In this paper, we propose an active learning algorithm for deter- 
ministic TAs (DTAs). The languages recognizable by DTAs are called recogniz- 
able timed languages |21]. Our strategy is as follows: first, we develop a Myhill- 
Nerode style characterization of recognizable timed languages; then, we extend 
the L* algorithm for recognizable timed languages using the similarity of the 
Myhill-Nerode style characterization. 

Due to the continuity of dwell time in timed words, it is hard to character- 
ize recognizable timed languages by a Nerode-style congruence between timed 
words. For example, for the DTA in Fig. 1c, for any 7, r’ € [0, 1) satisfying T < 7’, 
(1 — 7’)a distinguishes 7r and T’ because T(1 — 7’)a leads to lọ while r(1 — T)a 
leads to lı. Therefore, such a congruence can make infinitely many classes. 

Instead, we define a Nerode-style congruence between sets of timed words 
called elementary languages [21]. An elementary language is a timed language 
defined by a word with a conjunction of inequalities constraining the time dif- 
ference between events. We also use an equality constraint, which we call, a 
renaming equation to define the congruence. Intuitively, a renaming equation 
bridges the time differences in an elementary language and the clock variables 
in a TA. We note that there can be multiple renaming equations showing the 
equivalence of two elementary languages. 


Example 1. Let pı and pg be elementary languages pı = {rari | 7] € 
(0,1),7¢ € (0,1),7¢ +7} € (0,1)} and po = {rZarzar? | rê € (1,2), 7? € 
(0,1),7? € (0,1),7? + 7? € (0,1)}. For the DTA in Fig. 1c, pı and pz are 
equivalent with the renaming equation 73 + T = T? +73 because for any 
Wi = Tar} € pı and wz = TêaT?aT? € po: 1) we reach lp after reading either 
of wı and w and 2) the values of c after reading w; and wz are TO +7} and 
T? + T2, respectively. 


We characterize recognizable timed languages by the finiteness of the equiv- 
alence classes defined by the above congruence. We also show that for any rec- 
ognizable timed language, there is a finite set S of elementary languages such 
that the equivalence of any prefixes can be checked by the extensions S. 


6 M. Waga 


By using the above congruence, we extend the L* algorithm for DTAs. The 
high-level idea is the same as the original L* algorithm: 1) the learner makes 
membership queries to obtain enough information to construct a hypothesis DTA 
Ahyp and 2) the learner makes an equivalence query to check if Anyp recognizes 
the target language. The largest difference is in the cells of an observation table. 
Since the concatenation p-s of an index pair (p, s) € Px S is not a timed word but 
a set of timed words, its membership is not defined as a Boolean value. Instead, 
we introduce the notion of symbolic membership and use it as the value of each 
cell of the timed observation table. Intuitively, the symbolic membership is the 
constraint representing the subset of p-s included by Ltgt. Such a constraint can 
be constructed by finitely many (non-symbolic) membership queries. 


Example 2. Figureld illustrates a timed observation table. The equivalence 
between p1, p2 € P under S can be checked by comparing the cells in the rows 
indexed with pı and p2 with renaming equations. For the cells in rows indexed 
by pı and p2, their constraints are the same by replacing To + 7 with 7, + T2 
and vice versa. Thus, pı and po are equivalent with the current extensions S. 


Once the learner obtains enough information, it constructs a DTA via the 
monoid-based representation of recognizable timed languages [21]. We show that 
for any recognizable timed language, our algorithm terminates and returns a 
DTA recognizing it. We also show that the number of the necessary queries is 
polynomial to the size of the equivalence class defined by the Nerode-style con- 
gruence if symbolic membership queries are allowed and, otherwise, exponential 
to it. Moreover, if symbolic membership queries are not allowed, the number of 
the necessary queries is at most doubly exponential to the number of the clock 
variable of a DTA recognizing the target language and singly exponential to the 
number of locations of a DTA recognizing the target language. This worst-case 
complexity is the same as the one-clock DTA learning algorithm in [30]. 

We implemented our DTA learning algorithm in a prototype library 
LEARNTA. Our experiment results show that it is efficient enough for some 
benchmarks taken from practical applications, e.g., the FDDI protocol. This 
suggests the practical relevance of our algorithm. 

The following summarizes our contribution. 


— We characterize recognizable timed languages by a Nerode-style congruence. 
— Using the above characterization, we give an active DTA learning algorithm. 
— Our experiment results suggest its practical relevance. 


Related Work. Among various characterization of timed languages [4, 10-13, 21], 
the characterization by recognizability [21] is closest to our Myhill-Nerode-style 
characterization. Both of them use finite sets of elementary languages for char- 
acterization. Their main difference is that [21] proposes a formalism to define a 
timed language by relating prefixes by a morphism, whereas we propose a tech- 
nical gadget to define an equivalence relation over timed words with respect to 
suffixes using symbolic membership. This difference makes our definition suitable 
for an L*-style algorithm, where the original L* algorithm is based on Nerode’s 
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congruence, which defines an equivalence relation over words with respect to 
suffixes using conventional membership. 

As we have discussed so far, active TA learning [5, 15-17,30] has been studied 
mostly for limited subclasses of TAs, where the number of the clock variables or 
the clock variables reset at each edge is fixed. In contrast, our algorithm infers 
both of the above information. Another difference is in the technical strategy. 
Most of the existing algorithms are related to the active learning of symbolic 
automata [9,14], enhancing the languages with clock valuations. In contrast, we 
take a more semantic approach via the Nerode-style congruence. 

Another recent direction is to use a genetic algorithm to infer TAs in pas- 
sive [27] or active [3] learning. This differs from our learning algorithm based on 
a formal characterization of timed languages. Moreover, these algorithms may 
not converge to the correct automaton due to a genetic algorithm. 


2 Preliminaries 


For a set X, its powerset is denoted by P(X). We denote the empty sequence 
by £. For sets X,Y, we denote their symmetric difference by XAY = {x | x € 
XALEY}ULy|yEeVYAyE xX}. 


2.1 Timed Words and Timed Automata 


Definition 3 (timed word). For a finite alphabet X, a timed word w is an 
alternating sequence T9a1T|42...AnTn Of X and R>o. The set of timed words over 
X is denoted by T (X). A timed language L C T(X) is a set of timed words. 


For timed words w = 79417142..-QnT and w = tha,r{a,...a),,7),, their 
concatenation w : w’ is w- Ww = 79a17142...An(T™m + 7))a,T{a5...a1,7/,. The 
concatenation is naturally extended to timed languages: for a timed word w and 
timed languages L, L’, we let w-L = {w-we | we E€ L}, Lew = {we-w | we € L}, 
and L- L’ = {we- we | we E€ L, we E L'}. For timed words w and w, w is a 
prefix of w’ if there is a timed word w” satisfying w- w” = w. A timed language 
L is prefix-closed if for any w € L, L contains all the prefixes of w. 

For a finite set C of clock variables, a clock valuation is a function v € 
(Ryo)°. We let Oc be the clock valuation satisfying Oc¢(c) = 0 for any c € C. 
For v € (Rso)° and T € Rso, we let v +7 be the clock valuation satisfying 
(v+T)(c) =v(c) +7 for any c € C. For v € (Rso)© and p C C, we let v[p = 0] 
be the clock valuation satisfying (v[p := 0]) (x) = 0 for c € p and (v[p = 0])(c) = 
v(c) for c ¢ p. We let Go be the set of constraints defined by a finite conjunction 
of inequalities c hd d, where c € C, d € N, and ™ € {>,>,<,<}. We let Co 
be the set of constraints defined by a finite conjunction of inequalities c œx d or 
c— c xd, where c,d € C, d € N, and m € {>,>,<,<}. We denote A @ by T. 
For v € (Rso0)° and y € Cc U Gc, we denote v E ¢ if v satisfies y. 
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Definition 4 (timed automaton). A timed automaton (TA) is a 7-tuple 
(X, L,lo,C,I, A, F), where: X is the finite alphabet, L is the finite set of loca- 
tions, lg E€ L is the initial location, C is the finite set of clock variables, 
I: L > Co is the invariant of each location, A C Lx Go x (XU {e}) x P(C) x L 
is the set of edges, and F C L is the accepting locations. 


A TA is deterministic if 1) for any a € X and (l, g,a, p, l), (l, g’,a, p', l") € A, 
gg’ is unsatisfiable, or 2) for any (l, g,€, p, l’) € A, gAI(1) is at most a singleton. 
Figure 1c shows a deterministic TA (DTA). 

The semantics of a TA is defined by a timed transition system (TTS). 


Definition 5 (semantics of TAs). For a TA A = (2,L,lo,C,I,A,F), the 
timed transition system (TTS) is a 4-tuple S = (Q,q0,Qr,—), where: Q = 
L x (R>o) is the set of (concrete) states, go = (Io,0c) is the initial state, 


Qr = {(l,v) € Q|1e F} is the set of accepting states, and > C Q x Q is the 
transition relation consisting of the following’. 


- For each (l, v) € Q and T € Ryo, we have (l, v) S (lv +7) ifv +r KI) 
holds for each T’ € [0,7). 

- For each (l, v), (V, v’) € Q, a € X, and (l,g,a,p, l) € A, we have (l, v) + 
(V, v’) if we have v = g and v = v[p = Oj. 

- For each (l, v), (l, v) € Q, T € Ryo, and (l,g,€,p, l) € A, we have (l, v) & 
(V, v +7) if we have v = g, v = vip := 0], and Yr’ € [0, T). v +r H I(l). 


A run of a TA A is an alternating sequence qo, —1,q1;---,—n;qn Of 
qi E Q and >; € — satisfying qi-ı >; qi for any i € {1,2,...,n}. A run 
qdo; —=1; q1; ---,—n; qn is accepting if qn E Qr. Given such a run, the associated 


timed word is the concatenation of the labels of the transitions. The timed lan- 
guage L(A) of a TA A is the set of timed words associated with some accepting 
run of A. 


2.2 Recognizable Timed Languages 


Here, we review the recognizability [21] of timed languages. 


Definition 6 (timed condition). For a set T = {T0,T1,..., Tn} of ordered 
variables, a timed condition A is a finite conjunction of inequalities Ti j > d, 
where Tij = D4; Th, X E {>,>,<,<}, andd EN. 


A timed condition A is simple? if for each T; j, A contains d < Ti; <d+1 or 
d < T; j; AT; j < d for some d E N. A timed condition A is canonical if we cannot 
strengthen or add any inequality T; į; > d to A without changing its semantics. 


1 We use 3 to avoid the discussion with an arbitrary small dwell time in [21]. 
? The notion of simplicity is taken from [15]. 
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Definition 7 (elementary language). A timed language L is elementary if 
there are u = @1,42,..-.,4, E X* and a timed condition A over {T0, T1, ..., Tn} 
satisfying L = {T001T102 . . .AnTn | To; T1,- - -Tn = A}, and the set of valuations 
of {70,T1,---,Tn} defined by A is bounded. We denote such £L by (u, A). We let 
E(X) be the set of elementary languages over X. 


For p,p' € E(X), p is a prefix of p' if for any w’ € p’, there is a prefix w € p 
of w’, and for any w € p, there is w’ € p’ such that w is a prefix of w’. For any 
elementary language, the number of its prefixes is finite. For a set of elementary 
languages, prefiz-closedness is defined based on the above definition of prefixes. 

An elementary language (u, A) is simple if there is a simple and canonical 
timed condition A’ satisfying (u, A) = (u, A’). We let SE(X) be the set of simple 
elementary languages over X. Without loss of generality, we assume that for any 
(u, A) € SE(X), A is simple and canonical. We remark that any DTA cannot 
distinguish timed words in a simple elementary language, i. e., for any p € SE(X) 
and a DTA A, we have either p C L(A) or pN L(A) = 0. We can decide if 
pC L(A) or pN L(A) = Í by taking some w € p and checking if w € L(A). 


Definition 8 (immediate exterior). Let L = (u, A) be an elementary lan- 
guage. For a € X, the discrete immediate exterior ext“(L) of £L is ext*(L) = 
(u- a, AU {tu)41 = 0}). The continuous immediate exterior ext’(L) of L is 
ext'(£L) = (u, A‘), where A’ is the timed condition such that each inequality 
Ti juj = d in A is replaced with Tiju) > d if such an inequality exists, and oth- 
erwise, the inequality Ti ju < d in A with the smallest index i is replaced with 
T; luj =d. The immediate exterior of L is ext(L) = Uaes ext (£) U ext*(L). 


Example 9. For a word u = a-a and a timed condition A = {Too € (1,2)ATo1 € 
(1, 2) A To,2 E (1, 2) A Tie = (0, 1) A T22 = 0}, we have 1.3-a-0.5-a-0€ (u, A) 
and 1.7-a-0.5-a-0 ¢ (u, A). The discrete and continuous immediate exteriors 
of (u, A) are ext*((u, A)) = (u - a, 4°) and ext*((u, A)) = (u, A*), where A? = 
{To,0 E€ (1,2) A To E€ (1, 2) A To,2 E€ (1,2) A Ti2 = (0, 1) A T2,2 = T3,3 = 0} and 
At = {To,0 E (1,2) A To, € (1,2) A To2 € (1, 2) ATi2€ (0, 1) A T22 > O}. 


Definition 10 (chronometric timed language). A timed language L is 
chronometric if there is a finite set {(u1, A1), (u2, A2), . . . , (Um, Am) } of disjoint 
elementary languages satisfying L = Vieri2 77 aay (ui, A;). 


For any elementary language £, its immediate exterior ext(£) is chrono- 
metric. We naturally extend the notion of exterior to chronometric timed lan- 
guages, i.e., for a chronometric timed language £ = Viers2 woe wa (ui, Ai), we 
let ext(£) = Uie{i,2,...m} ext((u;i, 4;i)), which is also chronometric. For a timed 
word W = T0Q1T1402 . . . nTn, we denote the valuation of To, T1, ...,Tn by k(w). 

Chronometric relational morphism [21] relates any timed word to a timed 
word in a certain set P, which is later used to define a timed language. Intuitively, 
the tuples in ® specify a mapping from timed words immediately out of P to 
timed words in P. By inductively applying it, any timed word is mapped to P. 
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Definition 11 (chronometric relational morphism). Let P be a chrono- 
metric and prefix-closed timed language. Let (u,A,u', A’, R) be a 5-tuple such 
that (u, A) C ext(P), (u’, A’) C P, and R is a finite conjunction of equations 
of the form Ti ju = Tai where i < |u| and j < |u’|. For such a tuple, 
we let [(u, A,u’, A’, R)] C (u, A) x (u’, A’) be the relation such that (w,w’) € 
[(u, A, u’, A’, R)] if and only if k(w), K(w’) H R. For a finite set ® of such tuples, 
the chronometric relational morphism [®] C T (2) x P is the relation inductively 
defined as follows: 1) for w € P, we have (w,w) € [B]; 2) for w € ext(P) and 
w € P, we have (w,w’) € [B] if we have (w,w’) € [(u,A,u’, A’, R)] for one 
of the tuples (u, A,u’, A’, R) E€ B; 3) for w € ext(P), w € T(*), and w” € P, 
we have (w -w', w”) € [B] if there is w” € T(X) satisfying (w, w”) € [B] and 
(w” : w, w”) € [B]. We also require that all (u, A) in the tuples in ® must be 
disjoint and the union of each such (u, A) is ext(P) \ P. 


A chronometric relational morphism [®] is compatible with F C P if for each 
tuple (u, A, u’, A’, R) defining [8], we have either (u’, A’) C F or (u’, A’)NF = 0. 


Definition 12 (recognizable timed language). A timed language L is rec- 
ognizable if there is a chronometric prefiz-closed set P, a chronometric subset F 
of P, and a chronometric relational morphism [P] C T(X) x P compatible with 
F satisfying L = {w | dw’ € F,(w,w’) € [S]}. 


It is known that for any recognizable timed language £, we can construct a 
DTA A recognizing £, and vice versa [21]. 


2.3 Distinguishing Extensions and Active DFA Learning 


Most DFA learning algorithms are based on Nerode’s congruence [18]. For a (not 
necessarily regular) language L C X*, Nerode’s congruence =p C X* x X* is 
the equivalence relation satisfying w =ç w’ if and only if for any w” € X*, we 
have w: w” E L 4> w- w" EL. 

Generally, we cannot decide if w =c w’ by testing because it requires 
infinitely many membership checking. However, if £ is regular, there is a finite 
set of suffixes S C X* called distinguishing extensions satisfying =~ = ree, 
where ye is the equivalence relation satisfying w Wo w’ if and only if for any 
w” € S, we have ww” E€ L => ww” € L. Thus, the minimum DFA recogniz- 
ing Ligt can be learned by®: i) identifying distinguishing extensions S satisfying 
Ehe T MEn and ii) constructing the minimum DFA A corresponding to ~E 

The L* algorithm [8] is an algorithm to learn the minimum DFA Anyp rec- 
ognizing the target regular language Li, with finitely many membership and 
equivalence queries to the teacher. In a membership query, the learner asks if 
w E X* belongs to the target language Ligt i.e., W E Ligt. In an equivalence 
query, the learner asks if the hypothesis DFA Any, recognizes the target language 


3 The distinguishing extensions S can be defined locally. For example, the TTT algo- 
rithm [19] is optimized with local distinguishing extensions for some prefixes w € X*. 
Nevertheless, we use the global distinguishing extensions for simplicity. 
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Algorithm 1: Outline of an L*-style active DFA learning algorithm 
P e {e}; S- {£} 
while T do 
while the observation table is not closed or consistent do 
update P and S so that the observation table is closed and consistent 
Anyp — ConstructDFA(P, S, T) 
switch eqz, ,(Anyp) do 
case T do 
| return Anyp 
case cex do 
| Update P and/or S using cex 


CMAN Oa kt WN 


m 
© 


Ligt 1. e., L(Anyp) = Ltgt, where L(Anyp) is the language of the hypothesis DFA 
Ahyp. When we have L(Anyp) # Ltgt, the teacher returns a counterexample 
cez € L(Abyp)ALtgt. The information obtained via queries is stored in a 2- 
dimensional array called an observation table. See Fig. 1b for an illustration. For 
finite index sets P, S C X*, for each pair (p, s) € (PU P. X) x S, the observation 
table stores whether p: s € Ligt. S is the current candidate of the distinguishing 
extensions, and P represents X*/ OANE Since P and S are finite, one can fill the 
observation table using finite membership queries. 

Algorithm 1 outlines an L*-style algorithm. We start from P = S = {£} and 
incrementally increase them. To construct a hypothesis DFA Ahnyp, the observa- 
tion table must be closed and consistent. An observation table is closed if, for 
each p € P- X, there is p’ € P satisfying p Eua p'. An observation table is 
consistent if, for any p,p' € PUP- X anda € X, p ~ Lest p' implies p-a A pa. 

Once the observation table becomes closed and consistent, the learner con- 
structs a hypothesis DFA Anyp and checks if L(Anyp) = Ltgt by an equivalence 
query. If L(Anyp) = Ltgt holds, Anyp is the resulting DFA. Otherwise, the teacher 
returns cer E€ L(Anyp)ALtet, which is used to refine the observation table. There 
are several variants of the refinement. In the L* algorithm, all the prefixes of cex 
are added to P. In the Rivest-Schapire algorithm [20,25], an extension s strictly 
refining S is obtained by an analysis of cex, and such s is added to S. 


3 A Myhill-Nerode Style Characterization 
of Recognizable Timed Languages with Elementary 
Languages 


Unlike the case of regular languages, any finite set of timed words cannot cor- 
rectly distinguish recognizable timed languages due to the infiniteness of dwell 
time in timed words. Instead, we use a finite set of elementary languages to define 
a Nerode-style congruence. To define the Nerode-style congruence, we extend the 
notion of membership to elementary languages. 
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Definition 13 (symbolic membership). For a timed language L C 

T(X) and an elementary language (u, A) € E(X), the symbolic membership 

memp ((u,A)) of (u,A) to L is the strongest constraint such that for any 
€ (u, A), we have w € L if and only if K(w) = memZ"(L). 


We discuss how to obtain symbolic membership in Sect. 4.5. We define a 
Nerode-style congruence using symbolic membership. A naive idea is to distin- 
guish two elementary languages by the equivalence of their symbolic member- 
ship. However, this does not capture the semantics of TAs. For example, for the 
DTA A in Fig. 1c, for any timed word w, we have 1.3-a-0.4-we L(A) <=> 
0.3-a-1.0-a-0.4-w E€ L(A), while they have different symbolic membership. 
This is because symbolic membership distinguishes the position in timed words 
where each clock variable is reset, which must be ignored. We use renaming 
equations to abstract such positional information in symbolic membership. Note 
that Tin = Mis Tk corresponds to the value of the clock variable reset at 7;. 


Definition 14 (renaming equation). Let T = {T0,T1,..., Tn} and T’ = 


ToT a be sets of ordered variables. A renaming equation R over T 
0 
a 


and is a "nite conjunction of equations of the form Tin = T where 


inl? 


i € {0,1,...,n}, i € {0,1,...,n"}, Tin = lp; Te and Tee = ae Th 


Definition 15 (~2). Le L C T(X) be a timed language, let 
(u, A), (u’, A’), (uw, A”) € E(X) be elementary languages, and let R be a renam- 
ing equation over T and T’, where T and T’ are the variables of A and 
A’, respectively. We let (u, A) ce ait (u’, A’) if we have the following: 
for aw w € (u,A), there is w E€ (u’,A’) satie ning K(w),K(w’) H R; 
memp ((u, A): (u AN) ARAA is equivalent to meme” ((u', A’): (u " ADARA. 
We let a a rh (u’, A’) if we have (u, A) ce ANR (u’, A’) and 
(w, A’ ce m R (u, A). Let S C E(X). We let (u, A) ae (u’, A’) if for any 
(u, A”) € S, we have (u, A) ~ UIR (w, A’). We let (u, A) ~$ (w, A’) if 
(u, A) a (u’, A’) for some renaming equation R. 


Example 16. Let A be the DTA in Fig. 1c and let (u, A), (u’, A’), and (u”, A”) be 
elementary languages, where u = a, A = {7 € (1,2)A7To+71 E (1,2)A71 € (0, 1}, 
uw =aca, A’ = {7 € (0, L)ATh +7 E (1,2)ATi +75 € (1,2) A73 E (0, 1)}, u” =a, 
and A” = {7 € (0, D A Tı = 0}. We have mem 14) ((u, A) -(u", A”) = ANA" A 
Tı +74 <1 and mem (4) ((u’, A’) - (u”, A”)) = A A A” Ath +74 <1. Therefore, 


aw aA © = 
for the renaming equation T1, = T, 2 we have (u, A) ~ a VTi, (w, A’). 


An algorithm to check if (u, A) ~2 (u’, A’) is shown in Appendix B.2 of [29]. 
Intuitively, (u, A) EY (u’, A’) shows that any w € (u, A) can be “simulated” 
by some w’ € (u’, A’) with respect to s and R. Such intuition is formalized as 


follows. 
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Theorem 17. For any L C T(X) and (u, A), (u’, A’), (u”, A”) € E(X) satisfying 
(u, A) Ee a (u’, A’), for any w E (u, A), there is w € (u’, A’) such that for 
any w” € (u", A"), wow" E L = w- w” € L holds. 


By Uu, aeee (us 4) = T (X), we have the following as a corollary. 


Corollary 18. For any timed language L C T(X) and for any elementary lan- 
guages (u, A), (u’, A’) € E(X), (u, A) A (u’, A’) implies the following. 


- For any w € (u, A), there is w’ € (u', A’) such that for any w” € T(X), we 
have w- w” E L 4 w- w” EL. 

- For any w' € (u', A’), there is w E€ (u, A) such that for any w” € T(X), we 
have w- w” EL 4 w- w" EL. 


The following characterizes recognizable timed languages with an 


Theorem 19. (Myhill-Nerode style characterization). A timed language 
L is recognizable if and only if the quotient set SE(Z) [ne is finite. 


By Theorem 19, we always have a finite set S of distinguishing extensions. 


Theorem 20. For any recognizable timed language L, there is a finite set S of 
elementary languages satisfying ~) = ~’. 


4 Active Learning of Deterministic Timed Automata 


We show our L*-style active learning algorithm for DTAs with the Nerode-style 
congruence in Sect. 3. We let Ligt be the target timed language in learning. 

For simplicity, we first present our learning algorithm with a smart teacher 
answering the following three kinds of queries: membership query mems, „ (w) 
asking whether w € Ligt, symbolic membership query asking mem; ((u, A)), and 
equivalence query eqy,.,(Ahyp) asking whether L(Anyp) = Ltgt. If L(Anyp) = 
Ligt? €dr.,,(Ahyp) = T, and otherwise, eqs, .(Anyp) is a timed word cer € 
L(Anyp)ALtet. Later in Sect. 4.5, we show how to answer a symbolic membership 
query with finitely many membership queries. Our task is to construct a DTA 
A satisfying L(A) = Lig: with finitely many queries. 


4.1 Successors of Simple Elementary Languages 


Similarly to the L* algorithm in Sect. 2.3, we learn a DTA with an observation 
table. Reflecting the extension of the underlying congruence, we use sets of simple 
elementary languages for the indices. To define the extensions, P U (P - X) in 
the L* algorithm, we introduce continuous and discrete successors for simple 
elementary languages, which are inspired by regions [4]. We note that immediate 
exteriors do not work for this purpose. For example, for (u, A) = (a, {To < 
2A Ti < 1}) and w = 0.7-a-0.9, we have w € (u, A) and ext*((u, A)) = 
(a,{To,1 = 2A T1, < 1}), but there is no t > 0 satisfying w-t € ext’((u, A)). 
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Algorithm 2: DTA construction from a timed observation table 


Input : A cohesive timed observation table (P, S,T) 
Output : A DTA Anyp row-faithful to the given timed observation table 
1 Function MakeDTA(P, S,T): 
2 p — b; F — {(u, A) € P| T((u, A), (£, 74 = 0)) = {AA 74 = 0} 
3 for p € P such that succ*(p) ¢ P (resp. succ*(p) ¢ P) do 
// Construct (u, A, u’, A’, R) for some p’ € P and R 
// Such R is chosen using an aeons ai 


4 pick p’ € P and R such that succ* (p) ~ ae A p’ (resp. succ*(p) ~ wa p’) 
5 add (u, A, u’, A’, R) to &, where (u, A) = ext* (p) (resp. ext? (p)) and (u’, A’) = p’ 
6 return the DTA Anyp obtained from (P, F, ®) by the construction in [21] 


For any (u, A) € SE(X), we let Oqu, a) be the total order over 0 and the 
fractional parts frac(To,n), frac(T1,.n),...,frac(Tnn) of Ton, Tin,---;Tnn- Such 
an order is uniquely defined because A is simple and canonical (Proposition 36 
of [29]). 


Definition 21 (successor). Let p = (u, A) E€ SE(X) be a simple elementary 
language. The discrete successor succ*(p) of p is succ? (p) = (u-a, AA Tn+1 = 0). 
The continuous successor succ (p) of p is succ’(p) = (u, At), where At is defined 
as follows: if there is an equation Ti n = d in A, all such equations are replaced 
with Tin E€ (d,d + 1); otherwise, for each greatest Tin in terms of O(u,a), we 
replace Tim € (d,d+1) with Tin =d+1. We let succ(p) = Uac y succ? (p) U 
succ’(p). For P C SE(X), we let succ(P) = Upe p suce(p). 


Example 22. Let u = aa, {Too € (1,2) A Toi € (1,2) A Too € 
(1,2) A Tia € (0, 1) A Ty paz a 1) A Te 2= = Of. The order O(u,A) is such that 
0= frac(T, 2) < frac(Tı2) < frac(To.2). The continuous successor of (u, A) 
is succ t((u, A)) = (u, AŻ), where A’ = {To,0 € (1,2) A To,1 € (1,2) A To,2 € 
(1,2) A Tia € (0,1) A Tig € (0,1) A Ta2 € (0,1)}. The continuous succes- 
sor of (u, A’) is succ’((u, A‘)) = (u, A"), where A* = {Too € (1,2) A Toa € 
(1, 2) A To,2 =2/ Tra E€ (0, 1) A Ti2 € (0, 1) VA T2,2 = (0, 1)}. 


4.2 Timed Observation Table for Active DTA Learning 


We extend the observation table with (simple) elementary languages and sym- 
bolic membership to learn a recognizable timed language. 


Definition 23 (timed observation table). A timed observation table is a 
3-tuple (P, S,T) such that: P is a prefiz-closed finite set of simple elementary 
languages, S is a finite set of elementary languages, and T is apnea mapping 
(p, 8) E€ (P Usucc(P)) x S to the symbolic membership mem; (p-s). 


Figure 2 illustrates timed chee avon tables: each cell indexed by (p, s) show 
the symbolic membership mem; Lew ™ (p: s). For timed observation tables, we extend 


the notion of closedness and consistency with ae we introduced in Sect. 3. 
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Algorithm 3: Counterexample analysis in our DTA learning algorithm 


1 Function AnalyzeCEX(cex): 
2 i — 1; wo — cer 
3 while fp € P.w; € p do 
4 i i+1 
5 split w;—ı into w; - w; such that w; € pi, for some p; € succ(P) \ P 
6 let p; € P and R; be such that p; ~ai Di 
g 
7 let W; € pi be such that (w4), n(W:;) = Ri 
8 wi — Ti w, 


find j € {1,2,...,i} such that wj_1 E Ligt AL(Anyp) and wj ¢ Ligt AL(Anyp) 
// We use a binary search with membership queries for [log(i)] times. 
10 return the simple elementary language including wy 


We note that consistency is defined only for discrete successors. This is because 
a timed observation table does not always become “consistent” for continuous 
successors. See Appendix C of [29] for a detailed discussion. We also require 
exterior-consistency since we construct an exterior from a successor. 


Definition 24 (closedness, consistency, exterior-consistency, cohe- 
sion). Let O = (P,S,T) be a timed observation table. O is closed if, for each 
p € succ (P) \ P, there is p' € P satisfying p am p'. O is consistent if, for each 
p,p' € P and for eacha € X, p AN p' implies succ? (p) Ee succ*(p’). O is 
exterior-consistent if for any p € P, succt(p) ¢ P implies succ (p) C extt (p). O 
is cohesive if it is closed, consistent, and exterior-consistent. 


From a cohesive timed observation table (P, S, T), we can construct a DTA as 
outlined in Algorithm 2. We construct a DTA via a recognizable timed language. 
The main ideas are as follows: 1) we approximate ae by see 2) we decide 
the equivalence class of ext (p) € ext(P)\ P in E(X) from successors. Notice that 
there can be multiple renaming equations R showing ae We use one of them 
found by an exhaustive search in Appendix B.2 of [29]. 

The DTA obtained by MakeDTA is faithful to the timed observation table in 
rows, i.e., for any p E PUsucc(P), Ltt Np = L(Anyp) Mp. However, unlike the L* 
algorithm, this does not hold for each cell, i.e., there may be p € PUsucc(P) and 
s € S satisfying Ligt N (p: s) Æ L(Anyp) N (p: s). This is because we do not (and 
actually cannot) enforce consistency for continuous successors. See Appendix C 
of [29] for a discussion. Nevertheless, this does not affect the correctness of our 
algorithm partly by Theorem 26. 


Theorem 25 (row faithfulness). For any cohesive timed observation table 
(P,S,T), for any p E€ PUsucc(P), Ligt N p = L(MakeDTA(P, S,T)) N p holds. 


s _ E(X) 


Theorem 26. For any cohesive timed observation table (P, S, T), Sra kia 


implies Lig = L(MakeDTA(P, S,T)). 
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Algorithm 4: Outline of our L*-style algorithm for DTA learning 


a P e {(£, To =0)}; S — {(e, 75 = 0)} 

2 while T do 

3 while (P,S,T) is not cohesive do 

4 if Sp € succ(P) \ P. Ap’ € P. p ~Legt p then // (P,S,T) is not closed 
5 P — PU {p} // Add such p to P 
6 else if 3p, p' € P,a € X. p ~Legt pA succ*(p) £Z, „succ? (p’) then 

// (P,S,T) is inconsistent due to a 

7 let S’ C S be a minimal set such that pee wlwes}|ses! lp g 

8 S — SU {{a-w|wEs}|sEe sS} 

9 else // (P,S,T) is not exterior-consistent 
10 P — PU {p' € suce’(P)\ P | 3p € P. p' = succ* (p) A p' £ ext*(p)} 
11 fill T using symbolic membership queries 
12 Ahyp — MakeDTA(P, S, T) 
13 if cez = Vet at (Anyp) then 
14 add AnalyzeCEX(cer) to S 
15 else return Anyp // It returns Anyp if CV iat (Anyp) =T. 


4.3 Counterexample Analysis 


We analyze the counterexample cex obtained by an equivalence query to refine 
the set S of suffixes in a timed observation table. Our analysis, outlined in 
Algorithm 3, is inspired by the Rivest-Schapire algorithm [20,25]. The idea is 
to reduce the counterexample cex using the mapping defined by the congruence 
hia (lines 5-7 ), much like @ in recognizable timed languages, and to find a 


suffix s strictly refining S (line 9), i.e., satisfying p OFA p' and prag for 


some p € succ (P) and p’ € P. 

By definition of cer, we have cez = wo E LtgtAL(Anyp). By Theorem 25, 
Wn Ë Ligt AL(Anyp) holds, where n is the final value of i. By construction of Anyp 
and w;, for any i € {1,2,...,n}, we have wo E€ L(Anyp) <> wi E€ L(Anyp)- 
Therefore, there is i € {1,2,...,n} satisfying wi-1 E LtgtAL(Anyp) and w; ¢ 
Lig AL(Anyp). For such i, since we have wy = wh: w! E LegtAL(Anyp), 
wi = Wi: wy Z Ligt AL(Anyp), and K(w)),«(W:) H Ri, such wi’ is a witness 


of Dh Ri pi. Therefore, S can be refined by the simple elementary language 
s € SE(X) including w%!. 


4.4 L*-Style Learning Algorithm for DTAs 


Algorithm 4 outlines our active DTA learning algorithm. At line 1, we initialize 
the timed observation table with P = {(£, To = 0)} and S = {(£, r6 = 0)}. In 
the loop in lines 2-15, we refine the timed observation table until the hypothesis 
DTA Anyp recognizes the target language Ltgt, which is checked by equivalence 
queries. The refinement finishes when the equivalence relation OAN defined by 


the suffixes S converges to nines ) and the prefixes P covers SE (5) Ad ; 


In the loop in lines 3-11, we male the timed observation table cohesive. If the 
timed observation table is not closed, we move the incompatible row in succ(P) \ 
P to P (line 5). If the timed observation table is inconsistent, we concatenate an 
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| (©, 74 = 0) nil acı > 1 


(€,707=0) | to=74,=0 
G, ro € (0, 1)) |0 = 74 < To < 1 (J A 
(a, To Tı 0) | To TL Tå 0 


(a) The initial timed observa- (b) DTA Aj,,,, con- 
tion table O; = (Pı, S1,Tı) structed from O; (c) DTA A yo constructed from O3 
[So (EiT 0) sı (a, 7] 0< T <1) 
Po = (©, To = 0) | 
pi = (£, To € (0, 1)) 


p2 = (a, To = To +71 = T1 o| 


a,co > 1/a :=0 


acr < 1/0 = c1 


ap To + 7g € 0,1) 


(d) Timed observation table O2 = (P2, S2, T2) after processing cer 
(e, T = 0) (a, Ti =0 < T6 < 1) 


(€, To = 0) 
(£, To € (0, 1)) T To +76 € (0,1) 
(£, To = 1) T as 
(a, To = 70 +71 = 1A Tı = 0) T 
(a, To = 1 A71 € (0, 1)) 
(atm =rn =1AT0+71 = 2) 
(a, To = 70 +71 = 71 = 0) 


Tı +76 € (0,1) 


(a, To = To +71 € (0,1) A Tı = 0) To +714 74 € (0,1) 

(£, To € (1, 2)) + 

(aa, To = To + Ti = To + Ti +72 =1 AT = T2 =71 +72 = 0) T 
(aa, To = 1 A T1 = T1 + T2 € (0,1) A To +71 = To + T1 +72 € (1,2) AT2 = 0) Tı + T2 +76 € (0,1) 


(a,to =1 <1 <2<7+71 < 3) L L 
(aa, To = Ti = 71 +72 = 1 A To + Ti =T+71 +72 =2AT2 =0) af 


(e) The final observation table O3 = (P3, $3, T3 


Fig. 2. Timed observation tables O1, 02,03, and the DTAs Abyp and Ain made from 
O; and O3, respectively. In O2 and O3, we only show the constraints non-trivial from 
p and s. The DTAs are simplified without changing the language. The use of clock 
assignments, which does not change the expressiveness, is from [21]. 


event a € X in front of some of the suffixes in S (line 8). If the timed observation 
table is not exterior-consistent, we move the boundary succ'(p) € succ’(P) \ P 
satisfying succ'(p) É ext’(p) to P (line 10). Once we obtain a cohesive timed 
observation table, we construct a DTA Anyp = MakeDTA(P,S,T) and make 
an equivalence query (line 12). If we have L(Anyp) = Ltgt, we return Anyp. 
Otherwise, we have a timed word cex witnessing the difference between the 
language of the hypothesis DTA Ajy, and the target language Ligt- We refine 
the timed observation table using Algorithm 3. 


Example 27. Let Ligt be the timed language recognized by the DTA in Fig. 1c. 
We start from P = {(£, To = 0)} and S = { (£, 7) = 0}. Figure 2a shows the initial 
timed observation table O,. Since the timed observation table O; in Fig. 2a is 
cohesive, we construct a hypothesis DTA Aiyp The hypothesis recognizable 
timed language is (P1, F1, ®1) is such that Pı = Fı = {(£, To = 0)} and ı = 
{(£, To > 0, £, To, T), (à, To = To + T1 = T1 = 0, £, To, T)}. Figure 2b shows the first 
hypothesis DTA Aj,,,.- 

We have L(Aiyp) # Ligt, and the learner obtains a counterexample, e. g., 
cer = 1.0 -a - 0, with an equivalence query. In Algorithm 3, we have wo = cez, 
wi = 0.5-a-0, wz = 0-a: 0, and wz = 0. We have wo ¢ L(Aiyp)ALtgt and 
wi € L(A}, )ALtgt, and the suffix to distinguish wo and w is 0.5-a-0. Thus, 
we add sı = (a, Ti =0 < 76 =764+ 7; <1) to Sı (Fig. 2d). 
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In Fig. 2d, we observe that T(p1, s1) is more strict than To(po,s1), and we 
have PPC Po. To make (P2, S2,T2) closed, we add pı to Pz. By repeating 
similar operations, we obtain the timed observation table O3 = (P3,53,73) in 
Fig. 2e, which is cohesive. Figure 2c shows the DTA Ab vp constructed from O3. 
Since L(Afyp) = Ligt holds, Algorithm 4 finishes returning Aj... 


By the use of equivalence queries, Algorithm 4 returns a DTA recognizing 
the target language if it terminates, which is formally as follows. 


Theorem 28 (correctness). For any target timed language Ligt, if Algorithm 
4 terminates, for the resulting DTA Anyp, L(Anyp) = Ligt holds. 


Moreover, Algorithm 4 terminates for any recognizable timed language Ltgt 
E on 


essentially because of the finiteness of SE(X)/~% 


Theorem 29 (termination). For any recognizable timed language Ligt, Algo- 
rithm 4 terminates and returns a DTA A satisfying L(A) = Ligt. 


Proof (Theorem 29). By the recognizability of Lig, and Theorem 19, 


SE(E) [Wee is finite. Let N = ISE(Z)/~), Since each execution of line 


5 adds p to P, where p is such that for any p’ € P, pe holds, it is exe- 


cuted at most N times. Since each execution of line 8 refines S, i.e., it increases 
|SE(2) /~2 Ziel line 8 is executed at most N times. For any (u, A) E€ SE(x), 
if A contains T;),; = d for some i € {0,1,...,|u|} and d € N, we have 
succ’((u, A)) C ext’((u, A)). Therefore, line 10 is executed at most N times. 
Since S is strictly refined in line 14, i.e., it increases ISE(Z)/~Z, |; line 14 


is executed at most N times. By Theorem 26, once Bua , saturates to ie 


MakeDTA returns the correct DTA. Overall, Algorithm 4 terminates. 


4.5 Learning with a Normal Teacher 


We briefly show how to learn a DTA only with membership and equivalence 
queries. We reduce a symbolic membership query to finitely many membership 
queries, answerable by a normal teacher. See Appendix B.1 of [29] for detail. 

Let (u, A) be the elementary language given in a symbolic membership query. 
Since A is bounded, we can construct a finite and disjoint set of simple and 
canonical timed conditions Aj, A5,..., A), satisfying Vi<i<n 4; = A by a simple 
enumeration. For any simple elementary language (u’, A’) € SE(X) and timed 
words w,w’ E€ (u', A’), we have w E€ L => w €L. Thus, we can construct 
mem; ™ (u, A)) by making a membership query memg (w) for each such (u’, A’) C 
(u, A) and for some w €E (u’, A’ o We need such an exhaustive search, instead of 
a binary search, because mem?" ((u, A)) may be non-convex. 

Assume A is a canonical timed condition. Let M be the size of the variables 
in A and I be the largest difference between the upper bound and the lower 
bound for some T;,; in A. The size n of the above decomposition is bounded by 


(2x I+ i ie cane Gane which exponentially blows up with respect to M. 
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In our algorithm, we only make symbolic membership queries with elemen- 
tary languages of the form p-s, where p and s are simple elementary languages. 
Therefore, I is at most 2. However, even with such an assumption, the number 
of the necessary membership queries blows up exponentially to the size of the 
variables in A. 


4.6 Complexity Analysis 


After each equivalence query, our DTA learning algorithm strictly refines S' or 
terminates. Thus, the number of equivalence queries is at most N. In the proof 
of Theorem 29, we observe that the size of P is at most 2N. Therefore, the 
number (|P| + |succ(P)|) x |S| of the cells in the timed observation table is at 
most (2N + 2N x (|X| + 1)) x N = 2N?|2| + 2. Let J be the upper bound of 
i in the analysis of cez returned by equivalence queries (Algorithm 3). For each 
equivalence query, the number of membership queries in Algorithm 3 is bounded 
by [log J], and thus, it is, in total, bounded by N x [log J]. Therefore, if the 
learner can use symbolic membership queries, the total number of queries is 
bounded by a polynomial of N and J. In Sect. 4.5, we observe that the number 
of membership queries to implement a symbolic membership query is at most 
exponential to M. Since P is prefix-closed, M is at most N. Overall, if the learner 
cannot use symbolic membership queries, the total number of queries is at most 
exponential to N. 


Table 1. Summary of the results for Random. Each row index |L|_|’|_Kc shows the 
number of locations, the alphabet size, and the upper bound of the maximum constant 
in the guards, respectively. The row “count” shows the number of instances finished in 
3h. Cells with the best results are highlighted. 


# of Mem. queries # of Eq. queries | Exec. time [sec.] count 
max mean min max | mean | min | max mean min 
3.210 LEARNTA | 35,268 14,241 | 2,830 11 6 A | 2.32e+00 | 6.68e-01 4.50e-02 | 10/10 
ONESMT | 468 205 32 13 8 5 |9.58e-01 | 2.89e-01 | 6.58e-02 10/10 
42.10 LEARNTA | 194,442 55,996 | 10,619 | 14 T 4 |2.65e+01 |7.98e+00 | 4.88e-01 | 10/10 
ONESMT | 985 451 255 16 12 7 | 8.53e-01 | 2.09e-01 | 1.27e-01 10/10 
4.4.20 LEARNTA | 1,681,769 | 858,759 | 248,399 | 21 | 15 10 |8.34e+03 | 1.41e+03 | 3.23e+01 | 8/10 
ONESMT | 5,329 3,497 1,740 | 42 32 26 | 2.19e+00 1.42e+00 | 8.27e-01 10/10 
5.210 LEARNTA | 627,980 119,906 | 8,121 19 8 5 | 1.67e+02 | 2.28e+01 | 1.96e-01 10/10 
ONESMT | 1,332 876 359 22 16 12 5.20e-01 | 3.66e-01 | 2.58ce-01 | 10/10 
6.2.10 LEARNTA | 555,939 106,478 | 2,912 14 9 6 | 2.44e+02 | 2.8le+01 | 4.40e-02 10/10 


ONESMT | 3,929 1,894 |104 35 20 11 |1.72e+00 | 8.01e-01 |1.73e-01 |10/10 


Let Atgt = (X, L, lo, C, I, A, F) be a DTA recognizing Ltgt. As we observe in 
the proof of Lemma 33 of [29], N is bounded by the size of the state space of the 
region automaton [4] of Atgt, N is at most |C]! x210! xT] „<o(2K.+2) x |L], where 
K, is the largest constant compared with c € C in Atgt. Thus, without symbolic 
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membership queries, the total number of queries is at most doubly-exponential 
to |C| and singly exponential to |L|. We remark that when |C| = 1, the total 
number of queries is at most singly exponential to |L| and Ke, which coincides 
with the worst-case complexity of the one-clock DTA learning algorithm in [30]. 


5 Experiments 


We experimentally evaluated our DTA learning algorithm using our prototype 
library LEARNTA‘ implemented in C++. In LEARNTA, the equivalence queries 
are answered by a zone-based reachability analysis using the fact that DTAs are 
closed under complement [4]. We pose the following research questions. 


RQ1 How is the scalability of LEARNTA to the language complexity? 
RQ2 How is the efficiency of LEARNTA for practical benchmarks? 


For the benchmarks with one clock variable, we compared LEARNTA with 
one of the latest one-clock DTA learning algorithms [1,30], which we call 
ONESMT. ONESMT is implemented in Python with Z3 [23] for constraint solv- 
ing. 

For each execution, we measured the number of queries and the total execu- 
tion time, including the time to answer the queries. For the number of queries, we 
report the number with memoization, i.e., we count the number of the queried 
timed words (for membership queries) and the counterexamples (for equivalence 
queries). We conducted all the experiments on a computing server with Intel 
Core i9-10980XE 125 GiB RAM that runs Ubuntu 20.04.5 LTS. We used 3h as 
the timeout. 


“10° 


z 1.24 8 10° 4 n 2() 4 -A LearnTA 

o A- LEARNTA fa oO aes 

Š 1 | - OneSMT E Po 5 > ONESMT 

= 0.8 | Z 10* J A- LEARNTA = 154 

Ri 0.6 4 3 > ONESMT g 

S044 = e aa 5 Tr paa 

© 0.24 a $ 

I olea) E E o COo 
3354455556 33.5445 5 5.5 6 33.5445 5 5.5 6 

# of locations # of locations # of locations 


(a) Membership queries (b) Membership queries (log scale) (c) Equivalence queries 


Fig. 3. The number of locations and the number of queries for |L|-2-10 in Random, 
where |L| € {3, 4,5, 6} 


4 LEARNTA is publicly available at https://github.com/masWag/LearnTA. The arti- 
fact of the experiments is available at https://doi.org/10.5281/zenodo.7875383. 
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Table 2. Summary of the target DTAs and the results for Unbalanced. || is the number 
of locations, |X| is the alphabet size, |C| is the number of clock variables, and Kc is 
the maximum constant in the guards in the DTA. 


|L| |X| |C| Ko | # of Mem. queries | # of Eq. queries | Exec. time [sec.] 
Unbalanced:1 LEARNTA 5 1 (1 |2 [|51 2 2.00e-03 
Unbalanced:2 | LEARNTA|5 1 (2 |4 576,142 3 3.64e+01 
Unbalanced:3 | LEARNTA 5 1 3 4 403,336 4 2.24e+01 
Unbalanced:4 | LEARNTA |5 1 |4 (6 4,142,835 5 2.40e+02 
Unbalanced:5 | LEARNTA |5 1 (5 J6 10,691,400 5 8.68e+02 


5.1 RQ1: Scalability with Respect to the Language Complexity 


To evaluate the scalability of LEARNTA, we used randomly generated DTAs 
from [5] (denoted as Random) and our original DTAs (denoted as Unbalanced). 
Random consists of five classes: 3-210, 4-2_10, 4-4-20, 5_2_10, and 6_2_10, where 
each value of |L|-|X|-Kc is the number of locations, the alphabet size, and the 
upper bound of the maximum constant in the guards in the DTAs, respectively. 
Each class consists of 10 randomly generated DTAs. Unbalanced is our original 
benchmark inspired by the “unbalanced parentheses” timed language from [10]. 
Unbalanced consists of five DTAs with different complexity of timing constraints. 
Table 2 summarizes their complexity. 

Table 1 and 3 summarize the results for Random, and Table 2 summarizes the 
results for Unbalanced. Table 1 shows that LEARNTA requires more membership 
queries than ONESMT. This is likely because of the difference in the definition 
of prefixes and successors: ONESMT’s definitions are discrete (e. g., prefixes are 
only with respect to events with time elapse), whereas ours are both continuous 
and discrete (e.g., we also consider prefixes by trimming the dwell time in the 
end); Since our definition makes significantly more prefixes, LearnTA tends to 
require much more membership queries. Another, more high-level reason is that 
LEARNTA learns a DTA without knowing the number of the clock variables, 
and many more timed words are potentially helpful for learning. Table 1 shows 
that LEARNTA requires significantly many membership queries for 4-4-20. This 
is likely because of the exponential blowup with respect to Ko, as discussed 
in Sect. 4.6. In Fig.3, we observe that for both LEARNTA and ONESMT, the 
number of membership queries increases nearly exponentially to the number of 
locations. This coincides with the discussion in Sect. 4.6. 

In contrast, Table 1 shows that LEARNTA requires fewer equivalence queries 
than ONESMT. This suggests that the cohesion in Definition 24 successfully 
detected contradictions in observation before generating a hypothesis, whereas 
ONESMT mines timing constraints mainly by equivalence queries and tends to 
require more equivalence queries. In Fig. 3c, we observe that for both LEARNTA 
and ONESMT, the number of equivalence queries increases nearly linearly to the 
number of locations. This also coincides with the complexity analysis in Sect. 4.6. 
Figure 3c also shows that the number of equivalence queries increases faster in 
ONESMT than in LEARNTA. 
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Table 3. Summary of the target DTA and the results for practical benchmarks. The 
columns are the same as Table 2. Cells with the best results are highlighted. 


ILI Æ| |C] Ko | # of Mem. queries | # of Eq. queries | Exec. time [sec.] 
AKM LEARNTA | 17 |12 | 1 5 | 12,263 11 5.85e-01 
ONESMT | 17 | 12 |1 5 | 3,453, 49 7.97e+00 
CAS LEARNTA |14 |10 |1 | 27 | 66,067 17 4.65e+00 
ONESMT |14 |10 |1 | 27 | 4,769 18 9.58e+01 
Light LEARNTA | 5 1 (10 | 3,057 7 3.30e-02 
ONESMT | 5 1 |10 | 210 7 9.32e-01 
PC LEARNTA |26 |17 |1 |10 | 245,134 23 6.49e+01 
ONESMT |26 |17 |1 |10 |10,390 29 1.24e+02 
TCP LEARNTA |22 |13 |1 2 |11,300 15 3.82e-01 
ONESMT |22 |13 |1 2 |4,713 32 2.20e+01 
Train LEARNTA | 6| 6 |1 |10 | 13,487 8 1.72e-01 
ONESMT | 6 | 6 |1 |10 |838 13 1.13e+00 
FDDI | LEARNTA |16 T 6 | 9,986,271 43 3.00e+03 


Table 2 also suggests a similar tendency: the number of membership queries 
rapidly increases to the complexity of the timing constraints; In contrast, the 
number of equivalence queries increases rather slowly. Moreover, LEARNTA is 
scalable enough to learn a DTA with five clock variables within 15 min. 

Table1 also suggests that LEARNTA does not scale well to the maximum 
constant in the guards, as observed in Sect. 4.6. However, we still observe that 
LEARNTA requires fewer equivalence queries than ONESMT. Overall, compared 
with ONESMT, LEARNTA has better scalability in the number of equivalence 
queries and worse scalability in the number of membership queries. 


5.2 RQ2: Performance on Practical Benchmarks 


To evaluate the practicality of LEARNTA, we used seven benchmarks: AKM, 
CAS, Light, PC, TCP, Train, and FDDI. Table3 summarizes their complexity. All 
the benchmarks other than FDDI are taken from [30] (or its implementation [1]). 
FDDI is taken from TChecker [2]. We use the instance of FDDI with two processes. 

Table 3 summarizes the results for the benchmarks from practical applica- 
tions. We observe, again, that LEARNTA requires more membership queries and 
fewer equivalence queries than ONESMT. However, for these benchmarks, the 
difference in the number of membership queries tends to be much smaller than 
in Random. This is because these benchmarks have simpler timing constraints 
than Random for the exploration by LEARNTA. In AKM, Light, PC, TCP, and 
Train, the clock variable can be reset at every edge without changing the lan- 
guage. For such a DTA, all simple elementary languages are equivalent in terms 
of the Nerode-style congruence if we have the same edge at their last event and 
the same dwell time after it. If two simple elementary languages are equivalent, 
LEARNTA explores the successors of only one of them, and the exploration is 
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relatively efficient. We have a similar situation in CAS. Moreover, in many of 
these DTAs, only a few edges have guards. Overall, despite the large number of 
locations and alphabets, these languages’ complexities are mild for LEARNTA. 

We also observe that, surprisingly, for all of these benchmarks, LEARNTA 
took a shorter time for DTA learning than ONESMT. This is partly because 
of the difference in the implementation language (i.e., C++ vs. Python) but 
also because of the small number of equivalence queries and the mild number of 
membership queries. Moreover, although it requires significantly more queries, 
LEARNTA successfully learned FDDI with seven clock variables. Overall, such 
efficiency on benchmarks from practical applications suggests the potential use- 
fulness of LEARNTA in some realistic scenarios. 


6 Conclusions and Future Work 


Extending the L* algorithm, we proposed an active learning algorithm for DTAs. 
Our extension is by our Nerode-style congruence for recognizable timed lan- 
guages. We proved the termination and the correctness of our algorithm. We also 
proved that our learning algorithm requires a polynomial number of queries with 
a smart teacher and an exponential number of queries with a normal teacher. 
Our experiment results also suggest the practical relevance of our algorithm. 

One of the future directions is to extend more recent automata learning 
algorithms (e. g., TTT algorithm [19] to improve the efficiency) to DTA learning. 
Another direction is constructing a passive DTA learning algorithm based on our 
congruence and an existing passive DFA learning algorithm. It is also a future 
direction to apply our learning algorithm for practical usage, e. g., identification 
of black-box systems and testing black-box systems with black-box checking [22, 
24,28]. Optimization of the algorithm, e.g., by incorporating clock information 
is also a future direction. 
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Abstract. AWS IoT Events is an AWS service that makes it easy to 
respond to events from IoT sensors and applications. Detector models in 
AWS IoT Events enable customers to monitor their equipment or device 
fleets for failures or changes in operation and trigger actions when such 
events occur. If these models are incorrect, they may become out-of-sync 
with the actual state of the equipment causing customers to be unable 
to respond to events occurring on it. 

Working backwards from common mistakes made when creating 
detector models, we have created a set of automated analyzers that 
allow customers to prove their models are free from six common mis- 
takes. Our analyzers have been running in the AWS IoT Events pro- 
duction service since December 2021. Our analyzers check six correct- 
ness properties in the production service in real time. 93% of customers 
of AWS IoT Events have run our analyzers without needing to have any 
knowledge of them. Our analyzers have reported property violations in 
22% of submitted detector models in the production service. 


1 Introduction 


AWS IoT Events is a managed service for managing fleets of IoT devices. 
Customers use AWS IoT Events in diverse use cases such as monitoring 
self-driving wheelchairs, monitoring a device’s network connectivity, humidity, 
temperature, pressure, oil level, and oil temperature sensing. Customers use 
AWS IoT Events by creating a detector model that detects events occurring on 
IoT devices and notifies an external service so that a corrective action can be 
taken. An example is an industrial boiler which constantly reports its tempera- 
ture to a detector. The detector tracks the boiler’s average temperature over the 
past 90 min and notifies a human operator when it is running too hot. 
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Each detector model is defined as a finite state machine with dynamically 
typed variables and timers, where timers allow detectors to keep track of state 
over time. A model processes inputs from IoT devices to update internal state 
and to notify other AWS services when events are detected. Customers can use 
a single detector model to instantaneously detect events in thousands of devices. 
Ensuring well-formedness of a detector model is crucial as ill-formed detector 
models can miss events in every monitored device. 

Starting from a survey that identified sources of well-formedness problems in 
customer models, we identified most common mistakes made by customers and 
detect them using type- and model-checking. To use a model-checker for checking 
well-formedness of a detector model, we formalize the execution semantics of a 
detector model and translate this semantics into the source-language notation of 
the JKind model checker [1]. Model checking [2-9] verifies desirable properties 
over the behavior of a system by performing the equivalent of an exhaustive 
enumeration of all the states reachable from its initial state. Most model checking 
tools use symbolic encodings and some form of induction [6] to prove properties 
of very large finite or even infinite state spaces. 

We have implemented type-checking and model-checking as an analysis fea- 
ture in the production AWS IoT Events service. Our analyzers have reported 
well-formedness property violations in 22% of submitted detector models. 93% 
of customers of AWS IoT Events have checked their detector models using our 
analyzers. Our analyzers report property violations to customers with an average 
latency of 5.68 (see Sect. 4). 

Our contributions are as follows: 


1. We formalize the semantics of AWS IoT Events detector models. 

2. We identify six well-formedness properties whose violations detect common 
customer mistakes. 

3. We create fast, push-button analyzers that report property violations to cus- 
tomers. 


2 Overview 


Consider a user of AWS IoT Events who wants to monitor the temperature of an 
industrial boiler. If the industrial boiler overheats, it can cause fires and endanger 
human lives. To detect an early warning of an overheating event, they want to 
automatically identify two different alarming events on the boiler’s temperature. 
They want their first alarm to be triggered if the boiler’s reported temperature 
is outside the normal range for more than 1 min. They want their second alarm 
to be triggered if the temperature is outside the normal range for another 5 min 
after the first alarm. 

A user might try to implement these requirements by creating the (flawed) 
detector model shown in Fig. 1. This detector receives temperature data from 
the boiler and responds by sending a text message to the user. The detector 
model contains four states: 
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Fig. 1. AWS IoT Events detector model Fig.2. An action in the detector 
with two alarms (buggy version) model from Fig. 1 


— TempOK: starting state of the detector model. The detector stays in this 
state as long as the boiler’s temperature lies in a normal range. The detec- 
tor transitions from TempOK to GettingTooHot on detecting a temperature 
outside normal range, indicated by TempAbnormal. 

— GettingTooHot: detector starts a 1 min timer and transitions back to TempOK 
if the boiler cools down. When the timer expires, it transitions to TooHot. 

— TooHot: detector first notifies the user of the 1st alarm. It then starts a 5 min 
timer and transitions back to TempOK if the boiler cools down. When the 
5min timer expires, it transitions to StillTooHot. 

— StillTooHot: detector notifies user of the 2nd alarm. 


Understanding the Bug: Every state in the detector model consists of actions. 
An action changes the internal state of a detector or triggers an external service. 
For example, the GettingTooHot state consists of an action that starts a timer. 
The user can edit these actions with an interface shown in Fig. 2. This action 
starts a one minute timer named Wait1Min. Note that timers are accessible from 
every state in the detector model. Even though the Wait1Min timer is created 
in the GettingTooHot state of Fig. 1, it can be checked for expiration in all the 
four states of Fig. 1. 

The detector model in Fig. 1 has a fatal flaw based on a typo. The user has 
written timeout (“Wait1 Min”) instead of timeout (“Wait5Min”) when transitioning 
out of TooHot. This is allowed as timers are globally referenceable. However, it is 
a bug because each global timer has a unique name and the Wait1Min timer has 
already been used and expired. This makes StillTooHot unreachable, meaning 
the 2nd alarm won't ever fire, since a timer can expire at most once. 


Related Work. Languages such as IOTA [10], SIFT [11], and the system from 
Garcia et al. [12] use trigger-condition-action rules [13] to control the behavior of 
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internet of things applications. These languages have the benefit of being largely 
declarative, allowing users to specify desired actions under different environmen- 
tal stimuli. Similar to our approach, SIFT [11] automatically removes common 
user mistakes as well as compiles specifications into controller implementations 
without user interaction, and IOTA [10] is a reasoning calculus that allows cus- 
tom specifications to be written both about why something should or should not 
occur. AWS IoT Events is designed explicitly for monitoring, rather than con- 
trol, and our approach is imperative, rather than declarative: detector models 
do not have the same inconsistencies as rule sets, as they are disambiguated 
using explicit priorities on transitions. On the other hand, customers may still 
construct machines that do not match their intentions, motivating the analyses 
described in this paper. 


3 Technique 


In this section, we present a formal execution semantics of an AWS IoT Events 
detector model and describe specifications for the correctness properties. 


Formalization of Detector Models. Defining the alphabet and the transition 
relation for the state machine is perhaps the most interesting aspect of our for- 
malization. Since detector models may contain global timers, timed automata [14] 
might seem like an apt candidate abstraction. However, AWS IoT Events users 
are not allowed to change the clock frequency of timers, nor specify arbitrary 
clock constraints. These observations allow us to formalize the detector models 
as a regular state machine, with timeout durations as additional state variables. 

Formally, we represent the state machine for a detector model M as a tuple 
(S, So, I, G,T,Eg,Ex, Er), where: 


— S: finite set of states in the FSM, 

— So CS: set of initial state(s), 

— I: set of input variables assigned by the environment 

— G: set of global variables assigned by the state machine 

— T: set of timer variables that are reset by the model and updated as time 
evolves in the environment 

— Eg : S — « list: mapping from states to a (possibly empty) list of entry 
events to be performed when entering a state. « describes an event, further 
explained in the description of the grammar. 

— Ex : S — «k list is a mapping from states to a list of exit events to be 
performed when exiting a state. 

— Er : S — (k list x u list): mapping from states to a list of input events, 
including transitions to other states. 


It is assumed that the sets I, G, and T are pairwise disjoint, and we define 
the set V £ IUG to represent input and global variables in the model. 

We denote by V the set of values for global (G) and input (I) variables; V 
ranges over the values of primitive types: integers, decimals (rationals), booleans, 
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T ::= int | dec | str | bool 
€ ::= eo bop e; | uop eo | l| v | timeout(t) | isundefined(v) |... 
a ::= setTimer(t,e) | resetTimer(t) 
| clearTimer(t) | setGlobal(g, e) 
k ::= event(e, ax) 
u = transition(e, ax, s) 
l ::= message(i, v) | timeout (t) 


Fig. 3. Types, expressions, actions, and events in IoT Events Detector Models 


and strings. Integers and rationals are assumed to be unbounded, and rationals 
are arbitrarily precise. We use N as the domain for time and timeout values. Sets 
V+ and N+ are extended with the value L to represent an uninitialized variable. 

The grammar for types (T), expressions (e€), actions (a), events (x), transi- 
tions (u) and input triggers (c) is shown in Fig. 3. In the grammar, metavariable 
e stands for an expression, l stands for a literal value in V, v stands for any vari- 
able in V, t is a timer variable in T, a is an action, and 7 is an input in I. The 
unary and binary operators include standard arithmetic, Boolean, and relational 
operators. The timeout expression is true at the instant timer t expires, and the 
isundefined expression returns true if the variable or timer in question has 
not been assigned. Actions (aœ) describe changes to the system state: setTimer 
starts a timer and sets the periodicity of the timer, while the resetTimer and 
clearTimer reset and clear a timer (without changing the periodicity of the 
timer). The setGlobal action assigns a global variable. Events (x) describe con- 
ditions under which a sequence of actions occur. 

We define configurations C for the state machine as: 


C £S x (I — V+) x (T— (Nt x N*)) x (G > V+) 
Each configuration C = (s, i,t, g) tracks the following: 


— a state s € S in the detector model, 

the input valuation i € (I + V+) containing the values of inputs, 

— the timer valuation t € (T — (N+ x N+)) for user-defined timers. Each timer 
has both a periodicity and (if active) a time remaining, and 

— the global valuation g € (G — V~) for global variables in the detector model. 


Example 1. Consider a corrected version of our example detector model from 
Fig. 1 which has two timers, Wait1Min and Wait5Min, and no global variables. 
Some examples of configurations for this model are: 


— (TempOK, {temp : L}, {WaitiMin : (L, L), Wait5Min : (L,1)},{}) is the initial configuration. 
The model contains input temp, timers Wait1Min and Wait5Min, and no global 
variables. As no variables or timers have been assigned, all variables have value 
undefined (L). 
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Fig. 4. Rules describing behavior of the system 


— (TooHot, {temp : 300}, {WaitiMin : (60, L), Wait5Min : (300, 260)},{}) is the configuration 
at global time t if the temperature is still beyond the normal range and we 
transition to the TooHot detector model state. Note the Wait1iMin timer is no 
longer set whereas the Wait5Min timer has a periodicity of 300 and is set to 
expire at t + 260. 


To define the execution semantics, we create a structural operational seman- 
tics for each of the grammar rules and for the interaction with the external 
environment, as shown in Fig. 4. We distinguish semantic rules by decorating 
the turnstiles with the grammar type that they operate over (€, œ, K, 4, Er, and 
1). The variables e,a,k,m,i stand in for elements of the appropriate syntactic 
class defined by the turnstile. For lists of elements, we decorate the syntactic 
class with * (e.g. Fax), and the variables with ‘I (e.g. al). We use the following 
notation conventions: Given C = (s,i,t,g), we say C.s = s, and similarly with 
the other components of C. We also say C[s <— s'] is equivalent to (s’,i,t, g}, 
and similarly with the other components of C. 

Expressions (F<) evaluate to values, given a configuration. We do not present 
expression rules (they are simple), but illustrate the other rule types in Fig. 4. 
For actions (Fa), the setTimer rule establishes the periodicity of a timer and 
also starts it. The resetTimer and clearTimer rules restart an existing timer 
given a periodicity p or clear it, respectively, and the setGlobal rule updates 
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the value of a global variable. Events (K) are used by entry and exit events for 
states. The list rules for actions (ax) and events (K*) are not presented but are 
straightforward: they apply the relevant rule to the head of the list and pass the 
updated configuration to the remainder of the list, or return the configuration 
unchanged for nil. Transition event lists (ux) cause the system to change state, 
executing (only) the first transition from the list whose guard e evaluates to 
true. Finally, the top-level rule +, describes how the system evolves according to 
external stimuli. 

A run of the machine is any valid sequence of configurations produced by 
repeated applications of the F, rule. Timeout inputs increment the time to the 
earliest active timeout as described by the matchesEarliest predicate: 


ti, Pi- (pi, 0) = t(ti)^ 
Vti, pi y- ((pj y) = ttj) = y=1LVy2 r) 


matchesEarliest(t, x) = 


The subtractTimers function subtracts t; from each timer in C, and the 
clearTimers function, for any timers whose time remaining is equal to zero, 
calls the clearTimer action. 


3.1 Well-formedness Properties 


To find common issues with detector models, we surveyed (i) detector models 
across customer tickets submitted to AWS IoT Events, (ii) questions posted on 
internal forums like the AWS re:Post forum [15], and (iii) feedback submitted via 
the web-based console for AWS IoT Events. Based on this survey, we determined 
that the following correctness properties should hold over all detector models. 
For more details about this survey, please refer to Appendix A. 


The Model does not Contain Type Errors: The AWS IoT Events expression 
language is untyped, and thus, may contain ill-typed expressions, e.g., performing 
arithmetic operations on Booleans. A large class of such bugs can be readily 
detected and prevented using a type inference algorithm. The algorithm follows 
the standard Hindley-Milner type unification approach [16-18] and generates 
(and solves) a set of type constraints or reports an error if no valid typing 
is possible. We use this type inference algorithm to detect type errors in the 
detector model. Every type error is reported as a warning to the customer. 
When our type inference successfully infers types for expressions, we use them 
to construct a well-typed abstract state machine using the formalization reported 
in Sect. 3. 

For the remaining well-formedness properties we use model checking. We 
introduce one or more indicator variables in our global abstract state to track 
certain kinds of updates in the state machine, and then we assert temporal 
properties on these indicator variables. Because we use a model checker that 


1 In the interests of space, we do not cover the batch execution mode, where all variables 
used in expressions maintain their “pre-state” value until the step is completed; it is 
a straightforward extension. 
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checks only safety properties, in many cases we invert the property of interest 
and check that its negation is falsifiable, using the same mechanism often used 
for test-case generation [19]. 


Every Detector Model State is Reachable and Every Detector Model 
Transition and Event can be Executed: For each state s € S, we add a 
new Boolean reachability indicator variable v$acheq to our abstract state that 
is initially false and assigned true when the state is entered (similarly for 
transitions and events). To encode the property in a safety property checker, we 
encode the following unreachability property expressed in LTL and check it is 
falsifiable. If it is provable, the tool warns the user. 


Unreachable(s) =O (= vache) 


Every Variable is Set Before Use: In order to test that variables are properly 
initialized, first we identify the places where variables are assigned and used. In 
detector models, there are three places where variables are used: in the evaluation 
of conditions for events and transitions, and in the setGlobal action (which 
occurs because of an event or transition). We want to demonstrate that the 
variables used within these contexts are never equal to L during evaluation. In 
this case, we can reuse the reachability variables that we have created for events 
and transitions to encode that variables should always have defined values when 
they are used. 

We first define some functions to extract the set of variables used in expres- 
sions and action lists. The function Vars(e) : € — v set simply extracts the 
variables in the expression. For action lists, it is slightly more complex, because 
variables are both defined and used: 


Vars(nil) = {} 
Vars(setTimer(t, e) :: tl) = Vars(e) U Vars(tl) 
Vars(resetTimer(t) :: tl) = Vars(tl) 
Vars(clearTimer(t) :: tl) = Vars(tl) 
Vars(setGlobal(g, e) :: tl) = Vars(e) U (Vars(tl) — {g}) 
Vars(event(e, al)) = Vars(e) U Vars(al) 
Vars(transition(e, al, s’)) = Vars(e) U Vars(al) 


Every event or transition can be executed at most once during a computation 
step, so we can use the execution indicator variables to determine when a variable 
might be used. 


Va;, vj E€ Vars(a;) . 
SetBeforeUse(a;, vj) =O (vgi,, => v; # 1) 


exec 


Input Read Only on Message Trigger: This property is covered in the 
previous property, with one small change. To enforce it, we modify the translation 
of the semantics slightly so that at the beginning of each step, prior to processing 
the input message, all input variables are assigned L. 
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Message Triggered Between Consecutive Timeouts: We conservatively 
approximate a liveness property (no infinite path consisting of only timeout 
events) with a safety property: the same timer should not timeout twice without 
an input message occurring in between the timeouts. This formulation may flag 
models that do not have infinite paths with no input events, but our customers 
consider it a reasonable indicator. 

We begin by defining an indicator variable for each timer t; (of type integer 
rather than Boolean): vź meout and initialize it to zero. We modify the translation 
of updateTimers to increment this variable when its timer variable equals zero, 
and modify the translation of the message rule to reset all vf meout Variables to 
zero. The property of interest is then: 


A 


NoConsecutiveTimeouts(t;) (i es cere 2) 


4 Experiments 


In this section, we evaluate the performance of model-checking safety properties 
on detector models, with a focus on model checking latency. Low analysis latency 
is crucial because our tool warns customers of property violations while they are 
editing their detector model. Our type inference implementation runs with an 
average latency of 10 milliseconds on all the detector models in our experiments. 
Since type inference is much faster than model checking and can be successfully 
run on all detector models, we do not evaluate it in this section. 

AWS IoT Events has a commercial feature [20] which uses the type check- 
ing and model checking described in Sect. 3. The feature’s implementation first 
infers types using the type inference algorithm. Next, it translates the detector 
model into the Lustre language [21]. The translation of IoT Events into Lustre 
is straightforward and directly follows from the semantics presented in Sect. 3. 
The safety properties described in Sect. 3.1 are attached to the model, along with 
location information. Then the feature analyzes the model using the JKind [1] 
tool suite, an open-source industrial model-checker. If JKind invalidates a safety 
property, the feature decodes the location from the safety property and includes 
it in the warning. 

To evaluate this implementation, we randomly selected 210 detector models 
previously analyzed by the commercial feature. We checked the five properties 
described in Sect. 3.1 in parallel on a c4.8xlarge EC2 instance running Amazon 
Linux 2 x86_64 using JKind version 4.4.1, with a timeout of 60s. 

Of the safety properties that we were able to translate to Lustre, JKind 
resolved 96% within our timeout of 60s, with 80% completing in less than 10s. 

Tablel shows that checking the no-unreachable-action safety property 
requires the most time to complete. The detector models analyzed in the evalu- 
ation include models for monitoring self-driving wheel chairs, monitoring device 
connectivity, humidity, temperature, pressure, oil level, oil temperature, doors, 
motion, refrigerator temperature, dough fermentation, and vehicle speed-sensing. 
They consisted of between 1-7 states and from 0-14 state changes. The no- 
unreachable-action safety property is checked on every action, generating an 


36 A. Apicelli et al. 


Table 1. Performance of our model-checking tool against 210 detector models 


safety property avg. latency (milliseconds) | # completed | # translation failed | # timeout 
no-unreachable-state 3544 176 28 6 
no-unreachable-action 5586 171 28 11 
var-always-set-before-use 2968 179 28 3 
no-infinite-timer-expiration 2875 174 28 8 
no-input-read-with-timer-expiration | 5477 177 30 


average of 17 safety properties per detector model, the most of any kind of 
safety property. This large number of properties to be checked on every detector 
model caused checking the no-unreachable-action safety property to have the 
highest average latency (5.6s per analysis). 

Table 1 shows that about 13% of the properties could not be translated to 
Lustre. In 2% of the detector models, translation failures arose due to type 
errors or incorrect use of the AWS IoT Events expression language in the detec- 
tor model. The remaining translation failures occurred due to either: (1) use 
of operations not supported by Lustre, (2) no types being inferred for inputs 
or variables in the detector model, or (3) use of non-linear arithmetic, which 
is unsupported in JKind. Bitwise functions, strings, and array data types are 
supported in the AWS IoT Events expression language but not in Lustre. This 
language gap prevented us from translating 19 of the 210 detector models. Fail- 
ing to infer a type for a variable in the detector model prevented translation of 
6 of the 210 detector models. JKind’s lack of support for non-linear arithmetic 
prevented model-checking 2 of the 210 detector models. We are actively working 
to support more functions, string and array data types, type annotations, and 
non-linear arithmetic in our model-checking of detector models. 


5 Conclusion 


Our analyzers have been running in the AWS IoT Events production service 
since December 2021. Since then, 93% of AWS IoT Events customers have used 
our implementation to check their detector models for well-formedness, without 
needing to have any knowledge of the underlying type checking and model check- 
ing. Our analyzers successfully complete for 85% of real-world detector models 
and we are actively working on improving this support as explained in Sect. 4. 
Overall, our implementation has reported well-formedness property violations in 
22% of submitted detector models in the production service, with an average 
latency of 5.6s. We find giving customers push-button access to fast verification 
without requiring any knowledge of the underlying techniques enables adoption 
of automated reasoning-based tools. 
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A Common Issues with Detector Models 


Table 2. Issues seen in detector models from customer questions 


# Issue # of instances 


1 incorrectly scaling detector model 1 


6 insufficient logging permissions 3 


8 incorrect cross-service setup 8 
9 missing simplifications 8 
36 


As mentioned in Sect. 3.1, we surveyed customer detector models for generic 
correctness problems. We present the root causes of the problems from this study 
in Table 2. Incorrect scaling (#1) occurs when the customer does not set up their 
detector model to be instantiated correctly for every IoT device in their fleet. 
Infinite loop (#3) occurs when the detector model has an infinite execution path 
involving only timeout events and no external input messages. IoT models should 
be eventually quiescent if no external inputs occur. 

Variable-used-before-set (#4) occurs when a variable in the detector’s state 
is read from before being set to an initial value. AWS IoT Events does not require 
variables in detector models to be initialized. 

A step through a detector can be triggered due to both a timer expiration 
or a new value being sent to the detector by the outside world. Input read on 
timer expiration (445) occurs when a step, triggered by timer expiration, causes 
the detector to read from its input(s). This is a problem because customers often 
do not realize that such a read will return the last value sent to the detector by 
the outside world. Insufficient logging permissions (346) occurs when a detector 
is not given sufficient permissions to produce logging output. Incorrect cross- 
service setup (#8) occurs when customers do not correctly set up data flow 
across services in AWS IoT. While unnecessarily complex detector models (#9) 
is not a correctness problem, it poses a significant difficulty to customers in 
maintaining their detector models, and so, we include it in Table 2. 

Of these 9 root causes, we identified that type checking and model checking 
detected 5 root causes highlighted in green in Table 2. These 5 root causes were 
responsible for 44% of issues in our survey. Based on Table 2, we determined that 
the following correctness properties should hold over all detector models: 


1. Detector models must be well-typed 
2. Every detector model state must be reachable 


38 


A. Apicelli et al. 


3. Every detector model action must be executable 

4. Every variable must be set before being used 

5. Input reads shall not happen on timer expiration 

6. Detector model must not have infinite timer expirations 


We explain these properties further s in Sect. 3.1. 
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Abstract. Compositional verification, such as the technique of assume- 
guarantee reasoning (AGR), is to verify a property of a system from the 
properties of its components. It is essential to address the state explosion 
problem associated with model checking. However, obtaining the appro- 
priate assumption for AGR is always a highly mental challenge, especially 
in the case of timed systems. In this paper, we propose a learning-based 
compositional verification framework for deterministic timed automata. 
In this framework, a modified learning algorithm is used to automatically 
construct the assumption in the form of a deterministic one-clock timed 
automaton, and an effective scheme is implemented to obtain the clock 
reset information for the assumption learning. We prove the correctness 
and termination of the framework and present two kinds of improvements 
to speed up the verification. We discuss the results of our experiments to 
evaluate the scalability and effectiveness of the framework. The results 
show that the framework we propose can reduce state space effectively, 
and it outperforms traditional monolithic model checking for most cases. 


1 Introduction 


Model checking [9,19,33,36] is an important technique to automatically deter- 
mine whether a system satisfies a specified property. However, it suffers from 
the state explosion problem since it needs to store the explored system states in 
memory, which is impossible for most realistic systems [21]. In timed systems, 
although symbolic representations and partial order reductions have greatly 
increased the size of the systems that can be verified, many realistic timed sys- 
tems are still too large to be handled. In particular, if a system has several 
components, the number of global system states will grow exponentially with 
the number of components. Assume-guarantee reasoning (AGR) [20,25, 29,35] is 
a promising method helpful to address the state explosion problem. 

Consider a system M composed of two components Mı and Mə that synchro- 
nize on a given set of shared actions. Supposing we are to verify that M satisfies 
a property ¢, the verification rule in AG states that if there exists an assumption 
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A on the environment of Mə such that 1) Mı and A satisfy the property ¢, and 
2) Mə is a refinement of A, then M satisfies ¢. 

A major challenge in verifying component-based systems using the AG rule is 
the need to obtain the appropriate assumption that requires non-trivial human 
effort [26]. Based on abstraction-refinement paradigm in [22], the assumption is 
computed as a conservative abstraction of some of the components, and it is 
then refined using counterexamples obtained from model checking it [15]. The 
algorithm presented in [24] is capable of generating the weakest possible assump- 
tion automatically, though it does not compute partial results. In the later work 
[23], a framework is proposed for the automatic generation of assumptions in 
an incremental fashion using the L* learning algorithm [8]. Several improve- 
ments, e.g. [14,17,18,38], are proposed to further reduce the learning complex- 
ity. The work [6] by Alur et al. presents a symbolic implementation of the L* 
algorithm where the required data structures are maintained compactly using 
ordered BDDs [16]. 

All the aforementioned work focuses on untimed systems. For timed systems, 
using assume-guarantee style proof rules, the work in [39] proves a refined rep- 
resentation is a correct implementation of an abstract one. To check Zeroconf, 
a protocol for dynamic configuration of IPv4 link-local addresses, Berendsen et 
al. [12] model the protocol as a network of timed automata (TAs) [3,4], and 
provide a proof that combines model checking with the application of a new 
abstraction relation that is compositional with respect to committed locations. 
However, the abstract models there are all provided manually. Compared to the 
manual methods, the compositional verification framework presented in [31,32] 
utilizes a learning algorithm for automatic construction of timed assumptions 
for AGR. The work considers event-recording automata [5], which are a subclass 
of timed automata. Sankur [37] gives compositional verification for the system 
composed by a deterministic finite automaton (DFA) and a timed automaton, 
where a DFA assumption is learned [27] to approximate the timed component. 
The framework can only check the untimed property of the system and it has 
the limitation that the TA size is relatively small. 

The timed automaton is the most appreciated model for its simplicity and 
adequacy in expressiveness, and it is widely used for practical real-time systems 
[28,30]. However, to the best of our knowledge, though compositional verifica- 
tion for timed systems helps mitigate the state space explosion problem, there is 
still no work to tackle the problem of automatically inferring the timed assump- 
tions based on AGR for timed automata. Therefore, we propose, in this paper, a 
learning-based framework for AG-based automatic verification of deterministic 
timed automata. The framework applies the compositional rule in an iterative 
fashion. Each iteration consists of three steps. In the first step, based on the 
work in [7], a modified L* algorithm is presented to learn a timed assumption in 
the form of a deterministic one-clock timed automata (DOTAs) using member- 
ship queries. Then two further steps are conducted to check whether the learned 
assumption satisfies the two premises of the proof rule via candidate queries. We 
design an algorithm for model conversion with polynomial complexity, which 
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is executed as a step preceding the above iterative steps. It converts the input 
models Mı, M2 and ¢ to the output ones, which contain the clock reset infor- 
mation for the assumption learning. Thus, the complexity of the learning step 
in the framework in total is polynomial. We show this conversion preserves the 
verification results. 

We further prove the correctness and termination of the compositional ver- 
ification. We would like to note that the framework we propose applies to ver- 
ification of systems with a number of components. In other words, though the 
assumption learned is a DOTA, Mı and Mə can be compositions of several 
DOTAs. For this, we design a heuristic to transform multi-clock reset infor- 
mation to one-clock reset information, which enables the framework to handle 
learning-based compositional verification for multi-clock systems. We also pro- 
pose two improvements to speed up the verification, which are shown to have 
different advantages in cases of experiments. Finally, we implement the frame- 
work and conduct comparative experiments with UPPAAL [10,11] on cases of 
the benchmark of AUTOSAR (Automotive Open System Architecture) [1]. The 
experiments show that the framework proposed in this paper performs better 
than that of UPPAAL provided the properties to be checked are satisfied. 

The rest of the paper is organized as follows. In Sect. 2, we introduce back- 
ground knowledge. We present in Sect. 3 our learning-based compositional verifi- 
cation framework, as well as the proofs of termination and correctness. In Sect. 4, 
we present the two improvements. We report the experimental results in Sect. 5. 
Finally, we discuss the conclusions of the paper in Sect. 6. 


2 Preliminaries 


We use N to denote the set of natural numbers, R>o the set of non-negative 
reals, and let B = {T, L}, where T and L stand for true and false, respectively. 


2.1 Timed Automata 


Let X be a finite set of real-valued variables ranged over by x, y, etc. standing 
for clocks. A clock valuation for X is a function v : X ++ Rso which associates 
every clock x with a value v(x) € Rso. For t € Rso, let v + t denote the clock 
valuation which maps every clock x € X to the value v(x) + t. For a set y C X 
and a valuation v, we use [y — O]v to denote the valuation which resets all clock 
variables in y to 0 and agrees with v for other clocks in X\y. 

We use ®(X) to denote the set of clock constraints over X of the form 
g :== T |1 Xm | zı — zr: Xm | py, where 71,072 E X, m e N and 
x E {=, <, >, <, >}. We use y(v) = T to mean that the clock valuation v for X 
satisfies the clock constraint y over X, i.e. p evaluates to true using the values 
given by v. 


Definition 1 (Timed Automata). A timed automaton (TA) is a 6-tuple M = 
(Q, q0, ©, F, X, A), where Q is a finite set called the locations, qo € Q is the 
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initial location, X is a finite set called the alphabet, F C Q is the set of accepting 
locations, X is the finite set of clocks, and A C Q x X x B(X) x 2* x Q isa 
finite set called the transitions. 


A transition ô € A is a 5-tuple (q,0, 9,7, q), where q,q' E€ Q are respectively 
the source and target locations, o € X is an action, y is a clock constraint over 
X which is called the guard of the transition and specifies that the transition 
is enabled when it is true in the source state, and the set y C X gives the 
reset clocks by this transition. Thus, ô allows a jump from q to q’ by perform- 
ing an action ø if it is enabled, i.e. p(v) = T. We use [i] to denote the i'th 
element of the tuple ô = (q,0,9,7,q') fori =1,...,5. A run p of M is a finite 


sys o1,ti o2,t2 On, tn 
sequence of transitions p = (qo, vo) —> (q@1,%1) —> ++: =? (dn, Vn) where 


vo = {v(x)|v(x) = 0,2 € X}, and for all 1 < i < n there exists a transition 
(qi-1, Oi, Pi, Vir qi) € A such that pi(Yi-1 +t;) =T,andy; = [yi > O](vi-1 +t;). 
If qn is an accepting location, we say p is an accepting run of M. Each pair 
(Ti, ti) € X x R>o in the run p is called a timed action that indicates the action 
go; is applied after t; time units since the occurrence of the previous action. 

The timed trace of p is a timed word trace(p) = (01,t1) (02, tz)... (On; tn). 
Since time value t; represents delay time, we also call such a timed trace a 
delay-timed word, denoted by w. Adding the reset information along w, we 
get the corresponding reset-delay-timed word, denoted by w, = trace,(p) = 
(01, t1, 71) (02, t2, ¥2) ++: (On; tn, In). Notice that here y; is a clock set yi C X 
which records the reset clocks in the corresponding transition when taking timed 
action (0;, ti). 

If p is an accepting run of M, trace(p) is called an accepting timed word. The 
recognized timed language of M is the set of its accepting delay-timed words, 
i.e. L(M) = {trace(p) | p is an accepting run of M}. The recognized reset-delay- 
timed language L,(M) is defined as {trace,(p) |p is an accepting run of M}. A 
TA M is deterministic iff for any given delay-timed word w, there is at most one 
run p in M having trace(p) = w. 

For a run p, we define the corresponding logical-timed word uw; = (01, v1) 
(02, V2)-+++(On,Vn), where v; € Rv is the vector which records the values for 
all clocks in X. Therefore, delay-timed words and logical-timed words describe 
the operations of the timed model M from different perspectives. The former 
describe M from the external perspective, recording the actions and time inter- 
vals between two consecutive actions. While the latter describe it from the inter- 
nal perspective, recording the actions and the specific values of internal clocks 
when the actions occur. Both are necessary for the active learning algorithm 
described in Sect. 2.2. 

Given the clock reset information y; along the run p over the delay-timed 
word w = (01,t1) (02, t2)...(On,tn), we can obtain w’s corresponding logical- 
timed word uw, = (01, V1)(02, V2) ++: (On, Vn) by taking 
“i fy if aes or 2; E€ ¥%-1 for all 2 < i < n; (1) 

vi-ilj] + ti, otherwise. 
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where 1 < j < |X| and v;[j] is the fth element in v;. We use I’ to denote 
the mapping from the delay-timed words to the logical-timed words, that is, 
I'(w) = uw. With the reset information along the run p, we have the reset- 
logical-timed word wy, = (01,V1,71)(02, V2, Y2) --- (On, Vn, Yn). We can extend 
the mapping I’ to a mapping from the reset-delay-timed words to the reset- 
logical-timed words. 

The recognized logical-timed language of M is given as L(M) = 
{T (trace(p)) | p is an accepting run of M}, and the recognized reset-logical-timed 
language of M is L,(M) = {I'(trace,(p)) | p is an accepting run of M}. 


Definition 2 (Projection of Delay-Timed Words). Given a delay-timed 
word w = (01, t1) (02, ta) ... (On, tn) € (21 x Rso)* and an alphabet X2, the pro- 
jection ofw to X is a delay-timed word, denoted by w|»,, and defined as follows: 


w] X2: = (oi , Yai t) (ci ’ pe tj) ese (Cin D eee i) (2) 
where oi, € Xo is the ikth action inw, 1< k< m. 


Therefore, w] s, restricts each action o;, to be in Xə and modifies the cor- 
responding delay time of o;, to be the time interval between o;,-1 and oj, in 
w. For instance, let w = (a,1)(b,3)(a,1)(c,4)(a,2) and X2 = {b,c}, then the 
corresponding w] s, = (b, 4)(c, 5). 


Definition 3 (Parallel Composition of Timed Automata). Given two 
timed automata My = (Qı, qå, D1, Fi, Xi, Ai) and Mə = (Q2, qk, 22, Fo, Xə, As), 
assume that the clock sets Xı and Xə are disjoint. Their parallel composition is a 
TA Mı||M2 = (Q1 x Qa, (ql, q), Y1U X2, Fı x Fo, X1U X2, A) where the transitions 
A are as follows: 


- foro € Xı N X>, for every 6, : (q,0, 91,71,41) E Ar and 5 
(92,0; P2, Y2, 92) E€ Aa, ((q1, 42), 0, 1 A P2, 71 U Y2, (q1, 42)) € A. 

- foro € Xı \ Xo, for every ôı : (q1, 0, 91,71,41) E Ar and every q E€ Qo, 
((q41,4), 9; Y1, 71 (G4;4)) E A. 

- fora € Xa \ 24, for every 62 : (42,0, P2, Y2, 4h) € Ag and every q € Qı, 
((4,42);,0, P2, V2, (4, 42)) € A. 


The language of the composition is the set of accepting delay-timed words 
and L(M:|| M2) = {wlw € (21 U Xə) x Rso)* and w| s, € L(M;),i & {1,2}}. 


Definition 4 (Language Inclusion). Given two timed automata Mı and M2, 
if £L(M1)\ 5, = {w] s |w E€ L(M1)} is a subset of L( M2), we say M, satisfies Mo, 
denoted by Mı = Mg. 


Definition 5 (Deterministic One-Clock Timed Automata). A one-clock 
timed automaton (OTA) is the timed automaton with only one clock. A deter- 
ministic OTA is denoted by DOTA. 
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2.2 Learning Deterministic One-Clock Timed Automata 


In this section, we briefly describe the active learning algorithm for a DOTA 
M. We refer to [7] for more details. Active learning of a DOTA assumes the 
existence of a teacher who can answer two kinds of queries: membership and 
candidate queries posed by a learner. A membership query asks the question 
if w € L(M) for a logical-timed word w;; and a candidate query asks if the 
learned DOTA A represents the assumption satisfies the equation L(A) = £(M). 
The main challenge for learning the timed assumption is to obtain the reset 
information of the logical clocks for each transition. We consider two different 
settings, depending on whether the teacher also provides clock reset information 
along with answers to queries. 

A smart teacher is one which provides clock reset information along with 
answers to queries. It accepts a logical-timed word w; as an input for the mem- 
bership query from the learner. It then returns an answer about if the timed 
word is accepted or not together with reset information of each transition along 
the trace, that is, the reset-logical-timed word wri. 

When the smart teacher takes a candidate query from the learner, a coun- 
terexample is yielded and provided as a reset-delay-timed word. The algorithm 
maintains a timed observation table T to store answers from all previous queries. 
Once the learner has gained sufficient information, i.e. T is closed and consis- 
tent, an assumption A is constructed from the table. Then the learner poses 
a candidate query to the teacher to judge if L(A) = £(M). If yes, the algo- 
rithm terminates with the learned model A. Otherwise, the teacher responds 
with a reset-delay-timed word w, as a counterexample. After processing w,, the 
algorithm starts a new round of learning. The whole procedure repeats until 
the teacher gives a positive answer to a candidate query. It is known that the 
complexity of the algorithm is polynomial in the size of the learned model. In 
practical applications, this corresponds to the case where some parts of the model 
(information of clock reset) are known by testing or watchdogs. 

In the case when normal teacher is used, the learner needs to guess the 
reset information on each transition discovered in the observation table. At each 
iteration, the learner guesses all needed reset information and forms a number of 
table candidates. Due to the required guesses, the complexity of the algorithm 
is exponential in the size of the learned model. The following theorem which is 
presented in [7] shows that for both types of teachers, the algorithm converts 
the learning problem to that of learning the reset-logical-timed language. 


Theorem 1. Given two DOTAs M and A, if L,(M) = L,(A), then £L(M)= 
L(A). 


3 Framework for Learning-Based Compositional 
Verification of Timed Automata 


Consider a system M = Mı|| M2 consisting of two deterministic timed automata 
and a safety property @ represented as a deterministic timed automaton. We 
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devote this section to presenting our learning-based verification framework for 
automatically finding an appropriate assumption A in the AG rule to verify that 
M satisfies ¢. Section 3.1 first describes the framework. Then, in Sects. 3.2, 3.3 
and 3.4, the main algorithms of the framework are presented in detail. Finally, 
Sect. 3.5 shows the correctness and termination of the framework. 


3.1 Verification Framework via Assumption Learning 


Let X1, X2 and Xg be the alphabets of the TAs Mı, M2 and 4, respectively. 
We then have that the alphabet of the assumption Ag is X4, = (X1 U X4) O X2. 
The AG rule is stated as follow: 


Mi || Ao F ¢, Mə — Ao (3) 
Mı|| M2 E ¢ 


The rule converts the problem of verifying Mı|| M> = ¢ to that of finding an 
assumption Ag which is a DOTA satisfying both Mı||Ao = ¢ and Mə = Apo. 
Here, we consider Mı and Mə as general TAs, which are either a DOTA or 
compositions of a number of DOTAs. Therefore, the framework we propose 
is not only applicable to verifying the composition of just two components. 
For a system composed of n components, where n > 2, we can partition the 
components into two parts. For instance, if a system consists of 4 components 
M= {H1, H2, H3, H4}, we can let M, = H || H3 and Mo = Alp|| H4. In order to 
automatically obtain the assumption, we use model learning algorithms. How- 
ever, the current learning algorithm for DOTA [7] is not directly applicable. We 
thus design a “smart teacher” with heuristic to answer clock reset information 
for the learning. For this, we also need to design a model conversion algorithm. 
We illustrate the learning-based verification framework in Fig. 1. The inputs of 
the framework are Mı, Mz and property ¢ and the verification process consists 
of four steps, which we describe below. 


The First Step. This step converts the input models into TAs M{, M} and ¢' 
(ref to Sec. 3.2) without changing the verification results, i.e. checking Mi || M3 
against ġ' is equivalent to checking Mı || Mp against ¢. The output of this step is 
utilized to determine the clock reset information for the assumption learning in 
the second step. Then, the AG rule 3 is applied to Mj, M3 and ¢’. Thus, if there 
exists an assumption A such that Mi || A = ¢’ and M} H A, then Mi || M3 H g. 
The weakest assumption Aw is the one with which the rule is guaranteed to 
return conclusive results and M{ || A» = ¢’. 


Definition 6 (Weakest Assumption). Let Mj, M} and ¢’' be the models 
mentioned above and Xa = (X3 U Xg) N X3. The weakest assumption Aw of 
M; is a timed automaton such that the two conditions hold: 1) X4 „ = X4, and 
2) for any timed automaton E with Se = X4 and M} = E, MIE = ¢ iff 
EE Av. 
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Fig. 1. Learning-based compositional verification framework for timed automata 


The Second Step. A DOTA assumption A is learned through a number of 
membership queries in this step. The answer to each query involves gaining the 
definite clock reset information for each timed word, i.e. whether the clock of A 
is reset when an action is taken at a specific time. We design a heuristic to obtain 
such information from the clock reset information of the converted models M1, 
M} and ¢’. This allows the framework to handle learning-based compositional 
verification for multi-clock systems. We refer to Sect. 3.3 for more details. 


The Third and the Fourth Steps. Once the assumption A is constructed, 
two candidate queries start for checking the compositional rule. The first is a 
subset query to check whether Mil| A H| g. The second is a superset query to 
check whether M3 } A. If both candidate queries return true, the compositional 
rule guarantees that Mi || M3 H ¢’. Otherwise, a counterexample ctx (either ctzı 
or ctx in Fig. 1) is generated and further analyzed to identify whether ctx is a 
witness of the violation of M{|| M3 H} g. If it does not show the violation, ctx 
is used to update A in the next learning iteration. The details about candidate 
queries are discussed in Sect. 3.4. 

Therefore, L(A) is a subset of £(A,,) and a superset of £(M3)|»5,. It is not 
guaranteed that a DOTA A can be learned to satisfy L(A) = £(A,,). However, 
as shown later in Theorem 3, under the condition that £(A,,) is accepted by a 
DOTA, the learning process terminates when compositional verification returns 
a conclusive result often before £(A,,) is computed. This means that verification 
in the framework usually terminates earlier by finding either a counterexample 
that verifies that Mi || M; j4 ¢’ or an assumption A that satisfies the two premises 
in the reasoning rule, indicating Mi || M3 H ¢’. 
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Algorithm 1: ConvertW(M,, Mo, ¢) 
input : Two models Mı and M2 and the property ¢ to be verified 
output: Converted timed automaton Mi, M} and property ¢’ 

1 My’, Mł,” — ConvertS(M1, ¢, M2); 

2 ¢', Mi, M3 — ConvertS(¢”, MY, M2’); 

3 return Mi, Mé,d’; 


3.2 Model Conversion 


We use membership queries to learn the DOTA assumption. For a membership 
query with the input of a logical-timed word w, an answer from the teacher is the 
clock reset information of the word, which is necessary for obtaining the reset- 
logical-timed word w,;. As shown in [7], the learning algorithm with a normal 
teacher can only generate the answer by guessing reset information and this is 
the cause of high complexity. We thus design a smart teacher in our framework 
scheme. The smart teacher generates the answer to a query with the input w 
by directly making use of the available clock reset knowledge of £(A.,) (related 
with X4, Mı and ¢). To this end, we implement the model conversion from the 
models Mı, Mp and ¢ to the models Mj, M} and ¢’, respectively. 

The model conversion algorithm is mainly to ensure that each action in XA 
corresponds to unique clock reset information. Given an action ø having o € XA 
and o € X; (resp. 4), if there is only one transition by ø or all its different 
transitions have the same reset clocks, i.e. for any transitions 6, and 69, 6)[4] = 
59[4] if 41[2] = 52[2] = ø, the reset information for the action ø is simply 6[4] 
of any particular transition by ø. If there are different transitions by ø, say 61 
and 62, which have different reset clocks, i.e. 51 [4] 4 62/4], we say that the reset 
clocks of action o are inconsistent. 

Reset clock inconsistency causes difficulty for the teacher to obtain the clock 
reset information of an action in a whole run. To deal with this difficulty, we 
design model conversion in Algorithm 1 to convert Mı, Mp and ¢ into Mi, M3 
and ¢’. In the algorithm, the conversion is implemented by calling Algorithm 2 
twice to introduce auxiliary actions and transitions into Mı and ¢ to resolve 
reset clock inconsistency in the two automata, respectively. 

The converted models Mi, M4 and ¢’ returned by the invocations to Algo- 
rithm 2 have the property that all transitions with the same action o € X4 will 
have the same reset clocks, and thus Mj and ¢’ do not have reset clock incon- 
sistency. As shown later in Theorem 2, the verification of M{ || M3 against ¢’ is 
equivalent to that of Mj||M2 against ¢. 

Algorithm 2, denoted by ConvertS(M1,M2,Ms3), takes three determinis- 
tic TAs, namely Mı, Mz and Ms, as its input and convert them into three 
new TAs, namely M1, M4 and M5, as the output. We explain the three main 
functionalities of the algorithm in the following three paragraphs. 


Check Reset Information in Mı (Lines 1-6). Let X = (2y4,ULm,)N 2's, 
f be a binary relation between X and 2* , where X is the set of clocks of M1, and 


Learning Assumptions for Compositional Verification of Timed Automata 49 


Algorithm 2: ConvertS(M,, M2, M3) 
input : Three timed automata Mı, M2, M3 
output: Converted timed automata M1, M4 and M5 
1M, = Mı, M4 — M2, M3 — M3; 


3 f= b; 
4 for ô € Aj do 


5 if ô[2] € X and 6[2] g dom(f) then 
6 put (6[2], 6[4]) into f; 
7 else if 6[2] € dom(f) and (6[2], 6[4]) ¢ f then 
8 a — 6[2]; 
9 Onew +— introduce_new_action; 
10 put Onew into XM, XM, Yay; 
11 for 6’ € {wlw € Ai and w[2] =o and w[4] = 5[4]} do 
12 5'[2] — onew} 
13 for 5 € {wlw € AS and w[2] = o} do 
14 6’ —clone(d); 
15 5 [2] — onew; 
16 put 0’ into A; 
17 for ô € {wlw € AS and w[2] = o} do 
18 6’ —clone(d); 
19 5 [2] — onew; 
20 put ô' into 45; 


21 return M1, M4, M3; 


f = 9 initially. The transitions of Mı are checked one by one. For a transition 
ô, if its action 6[2] is in X but not in the domain of f (Line 5). Transition 6 is 
the first transition by 6[2] found, and thus the pair (ô[2], 6[4]) is added to the 
relation f. If the action of 6 is already in dom(f) but the reset clocks 6[4] is 
inconsistent with the records in f, the algorithm proceeds to the next steps to 
handle the inconsistency of the reset clocks (Lines 7-20). 


Introduce Auxiliary Actions in M, (Lines 7-12). If 6/2] € dom(f) A 
(6[2], 0[4]) Z f (Line 7), we need to introduce a new action (through the variable 
Onew) and add it to the alphabets of the output models. Then the transition 6 
with action ø is modified to a new transition, say 6’ by replacing action g with 
the value of Onew (Lines 11-12). 


Add Auxiliary Transitions in M and M; (Lines 13-20). Since new actions 
are introduced in Mj, we need to add auxiliary transitions with each new action 
in Mz and Mg accordingly. Specifically, consider the case when Mı and Moe 
synchronize on action ø via transitions ô and ô in the models, respectively. If 6 in 
My, is modified to ð in M‘{ by renaming its action ø to o’, a fresh co-transition 
0’ should be added to M/ which is a copy of ô by changing o to a’ so as for 
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the synchronisation in the composition of M‘, and Mj (Lines 13-16). The same 
changes are made for Mg (Lines 17-20). 


Example 1. Fig.2 shows an example of the conversion. In Mı, there are two 
transitions that contain action a but only one has clock reset. To solve clock 
reset inconsistency of M4, the new action a’ is introduced, and M, is converted 
into M?’ by changing action name a of one transition to a’ marked as an orange 
dashed line. In Mz and ¢, by adding the corresponding new transitions, M3’ and 
o” are achieved. In ¢”, the transitions with a and a’ still have different reset 
information, so it is further changed to ¢’ by adding a transition marked as a 
blue dotted line. Correspondingly, M and MY are changed. Obviously, we can 
determine the reset information of the transition with a (a’,a”,a’”) in automata 
Mj and g. 


peedse ees chee, 
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Mm a | 
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Fig. 2. Mı, M2 and ¢ are converted into Mi, M} and ¢’ 


We now show that the verification of Mj||M3 against ¢’ is equivalent to the 
original verification of M,|| M2 against ¢. 


Theorem 2. Checking Mi || M3 


= o’ is equivalent to checking M,||M2 


Eg. 


Proof. We prove M;||Mo 


L(M||M2||¢) # 0 & L(M 


E > & MillM 


Æ ¢’. This is equivalent to prove 


Mlo 


) Æ 0, where ġ and ¢’ are the complements 


of ọ and ¢’, respectively. 

We first prove £(M,||Mo2||¢) 4 0 > L(M}||M3\|¢’) # Ø. The left hand side 
implies that Mı ||Mə||ġ has at least one accepting run p. According to the con- 
struction of Mi, Mj and ¢’, for the composed model M‘||Mé||¢’, compared 
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with M;||Mo||¢, the locations and the guards of transitions remain the same, 
although some auxiliary transitions have been added to the model where actions 
are renamed. So we can construct a run p' in M{||M3||¢", which visits the loca- 
tions in the same order as p. Since p is an accepting run, its final location must 
be an accepting one, which implies p’ is an accepting run of M{||M5||¢’, and 
trace(p’) € L(M{||M3||¢’). 

For £(M{||M3||9") 4 0 = L(Mi||M2||) # 0, since L(Mj||M3||¢") # Ø, there 
exists at least one accepting run p’ in M{||M3||@’. Still, by the construction of 
M}, M} and ¢’, we can construct an accepting run p in M,||Mo||¢, by replacing 
the newly introduced actions along p’ with their original names, and trace(p) is 
an evidence of £L(M,||M2\||¢) 4 0. 


Complexity. For the model conversion, Algorithm 1 mainly consists of two 
invocations of Algorithm 2 which has a nested loop. In the worst case execution 
of Algorithm 2, the transitions of Mı in the outer loop and the transitions of 
Mı, Mz and M3 in the inner loops are traversed, so the time complexity is 
polynomial and quadratic in the number of transitions. 


3.3 Membership Queries 


After model conversion, a number of membership queries are used to learn 
the DOTA assumption A. For each membership query, the learner pro- 
vides the teacher a logical-timed word w, = (01,V1)(02,V2)°*:(On,;Vn) to 
obtain clock reset information, where o; € 24 and |v;| = 1. Based on 
the converted model, the teacher supplements corresponding reset informa- 
tion y; for each g; in uw; to construct the reset-logical-timed word wpm = 
(01, V1, ¥1) (2, V2, Y2) -- - (On; Vn; Yn). Though the learning algorithm we use is 
associated with one clock and the hypothesis we obtain is always a DOTA, the 
number of clocks in Mj and ¢’ might be multiple since they are not necessar- 
ily DOTAs. This raises the question of how to transform the multi-clock reset 
information to the single-clock reset information. To solve this problem, we use 
a heuristic to generate the one-clock reset information y; for each action o;. Let 
X be the finite set of clocks of M{ and ¢’, and x be the single clock of the 
learned assumption, where |X| > 1. For each action o;, we try four heuristics to 
determine whether x is reset: 1) random assignment, 2) y; is always {x}, 3) yi 
is always @, and 4) dynamic reset rule (if there exits a reset clock y € X, then 
yi = {x}, otherwise y; = Ø). We use the fourth since the verification has the least 
checking time. After obtaining the logcial timed word w,;, the teacher further 
checks whether it satisfies 6’ under the environment of M{ by model checking if 
Mi || Awn H= ¢’, where Aw, is the automaton constructed from wr. 

As shown in Fig.1, the step of model conversion is executed only once. It 
is then followed by the execution of the smart teacher we design, which only 
requires a polynomial number of membership queries for the assumption learn- 
ing. Without the first step, the framework needs to turn to a normal teacher, 
in which case the reset information is obtained by guessing, and an exponential 
number of membership queries are required. 
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3.4 Candidate Queries 


The candidate queries are to get answers about whether the learned hypothesis 
A satisfies the AG reasoning rule. 


The First Candidate Query. This step checks whether M/||A H ¢’. If the 
answer is positive, we proceed to the second candidate query. Otherwise, a 
counterexample ctx,, cta;|s, € L(A) is generated and further analyzed by 
constructing a TA Aceti such that Mi ||Actzı Æ ¢’. We then check whether 
ctzıls, € £(M3)\»,. If the result is positive, we have Mi || M3 jÆ ¢’. Otherwise, 
ctzı| s, € L(A) \ L(Au) and ctzı serves as a negative counterexample to refine 
assumption A via the next round of membership queries. 


The Second Candidate Query. This step checks whether M} }H A, i.e. 
L(MS)\s, © L(A). If yes, as Mil| A H ¢! and M} = A, the verification algo- 
rithm terminates and we conclude M{|| M} H ¢’. Otherwise, a counterexample 
ctx is generated and a TA Aetz, is constructed from the timed word ctx. We 
check whether Mj||Actz, KE Q. If yes, as ctvg|s, E€ L(M})ls4, we conclude 
Mi || M} KF ¢’. Otherwise, ctry| 5, E€ £L(Aw) \ L(A) is a counterexample, indicat- 
ing a new round learning is needed to refine and check A using membership and 
candidate queries until a conclusive result is obtained. 


3.5 Correctness and Termination 
We now show the correctness and termination of the framework. 


Theorem 3. Given two deterministic timed automata Mı and M2, and property 
Q, if there exists a DOTA that accepts the target language L(Aw), where Aw is 
the weakest assumption of the converted model M3, the proposed learning-based 
compositional verification returns true if o holds on Mı || M> and false otherwise. 


Proof. From Theorem 2, we only need to consider the converted models Mi, M3 
and g. 


Termination. The proposed framework consists of the steps of model conver- 
sion, membership and candidate queries. We argue about the termination of the 
overall framework by showing the termination of each step. 

By Algorithm 1 and Theorem 2, the step of model conversion terminates. 
Because the learning algorithm of DOTA terminates [7], assumption A will be 
obtained at last by membership queries. As to the candidate queries, they either 
conclude M{|| M; = ¢’ and then terminate, or provide a positive or negative 
counterexample ctx, that is, ctz] s, E€ £L(Aw) \ L(A) or ctz] s, € L(A) \ L(Aw), 
for the refinement of A. 

For the weakest assumption A,,, since there exists a DOTA which accepts 
L(Aw), the framework eventually constructs Aw in some round to produce 
the positive answer M{||Aw H= ¢’ to the first candidate query. As shown in 
Sect. 3.4, we can check whether £(M35)|5, C L(A). If the result is positive, we 
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have Mj||MS — @’ and the framework terminates. Otherwise, a counterexam- 
ple ctra|y, E€ L(M})l s4 \ £(Aw) is generated. So Mi || M} KK ¢’, and ctre is a 
witness to the fact that M{ || M3 violates ¢’. 


Correctness. Since there exists a DOTA that accepts the target language 
L(A), the framework always eventually terminates with a result which is either 
true or false. It is true only if both candidate queries return true and this means 
that ¢’ is held on Mj||Mé. Otherwise, a counterexample ctz] s, ¢ £(Aw) is gen- 
erated. Since Mj || Acts A Q and ctz] s, E L(M})l s4, hence Mi || M E g. 


It is possible, in some cases, there is no DOTA that can accept L(Aw), 
and the proposed verification framework cannot be guaranteed in these cases. 
However, the framework is still sound, meaning that for the cases when a DOTA 
assumption is learned and the verification terminates with a result, the result 
holds. Therefore, the framework is able to handle more flexible models such as 
multi-clock models. We will explore this with experiments in Sect. 5. 


Theorem 4. Given two deterministic timed automata Mi and M} which might 
have multiple clocks, and property ¢', even if there is no DOTA that accepts the 
target language L(A), the proposed verification framework is still sound. 


Proof. Given Mj and M} which are multi-clock timed automata, suppose in 
some round if the learned DOTA assumption A satisfies L(A) C L(Aw) and 
L(M})ls,a C L(A), we have that both results of the first and second candidate 
queries are positive. Hence, verification terminates and M{||M3 H ¢’ holds. For 
the same reasoning, in the case of a counterexample ctx is generated, that is 
Mi || Acts E Q and ctz] s, E€ L(M3})l s14, this implies that Mi || M}  ¢’ and the 
verification terminates with the valid result. 


The framework is not complete though. For a Mı with multiple clocks, it is not 
guaranteed to have a DOTA assumption A such that L(A) = L(Aw). Thus, the 
framework is not guaranteed to terminate. Furthermore, for a Mz with multiple 
clocks, the framework may not be able to learn a DOTA assumption A, such 
that £(M35)| 5, C L(A) even though Mi || M3 H g. 


4 Optimization Methods 


In this section, we give two improvements to the verification framework proposed 
in Sect.3. The first one reduces state space and membership queries in terms 
of the given information of Mj and ¢’. The second one uses a smaller alphabet 
than X4 = (X1 U X) N X3 to improve the verification speed. 


4.1 Using Additional Information 


In the process of learning assumption A with respect to M{ and ¢’, we make 
better use of the available information of Mj and ¢’. It is clear that if there 
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are more actions taking place from a learned location, it is likely there are more 
successor locations from that location and more symbolic states are needed. It 
is, in general, that not all the actions are enabled in a location. Since the logical- 
timed words of the models Mj or ¢’ are known beforehand, the sequence of 
actions that can be taken can be obtained. Therefore, we can use this information 
to remove those actions which do not take place from a certain location to reduce 
the number of successor states. Furthermore, the number of membership queries 
can be reduced by directly giving answers to these queries whose timed words 
violate the action sequences. This results in accelerating the learning process as 
well as speeding up the verification to some extent. The experiments in the next 
section also show these improvements. 

For example, Mj} has two actions read and write. In addition, it is known 
that the write action can only be performed after the read has been executed. 
So, we add such information to the learning step of the verification framework. 
That is, read should take place before write in any timed word. Thus, for the 
membership queries with such word w; =... (write, vz)... (read, Vm) ..., where 
write takes place before read, a negative answer is directly returned without the 
model checking steps for membership queries as shown in Sect. 3.3. 

The additional information is usually derived from the design rules and other 
characteristics of the system under study. In the implementation, we provide 
some basic keywords to describe the rules, e.g. “beforeAfter” specifies the order 
of actions, and “startWith” specifies a certain action should be executed first. 
Therefore, the above example is encoded as “[beforeAfter]:(read,write)”. 


4.2 Minimizing the Alphabet of the Assumption 


In our framework, the automated AG procedure uses a fixed assumption alphabet 
Xa = (UY) X3. However, there may exist an assumption A, over a smaller 
alphabet X, C X4 that satisfies the two premises of the AG rule. We thus 
propose and implement another improvement to build the timed assumption 
over a minimal alphabet. Smaller alphabet size can directly reduce the number 
of membership queries and thus speeds up the verification process. 


Theorem 5. Given X4 = (XUL N, if there exists an assumption A, over 
non-empty alphabet Xs C Xa satisfying Mi ||As | ¢' and M} = Ag, then there 
must exist an assumption A over X4 satisfying Mi || A = ¢ and M} = A. 


Proof. Based on A,, we can construct a timed assumption A over 54 as follows. 
For As = (Qs, 95, Xs, Fs, Xs, As), we first build A = (Q, qo, Xa, F, X, A) where 
Q = Qs, qo = q, F = Fs, A = A, and X = X,. Then for Yq € Q and Vo € 
Xa \ Xs, we add (q,0,true,@,q) into A. 

We now prove with such A, M{||A — g and M} = A still hold, that is, 
Mil| M} H} ¢’. Since the locations of A and A, are the same, the locations of 
Mi ||A and M{||A, are the same. For the composed model M{||A, and the newly 
added transition bnew = (q, o, true, Ú, q) from state q in A, since o € X4 \ Xs, it 
will be synchronized with such transition taking the form 61 = (qc, 0, Ge; Ye; q4) 
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in M}. So in M{||A, the composed transition with respect to (qe,q) and g, is 
((des q), 0, Yes Yes (7, q)). While in Mi l||As, for such transition 6; in Mj, though 
there is no synchronized transition from state q in A,, the composed transition 
is still ((de,9),0; Ve: Ye, (qh,q)) in Mi||As. So Mi||A H g. According to the 
construction process of A from As, as M} H| As, ie. C(M})ls. C L(A), it 
follows that M3 = A. 


The main problem with smaller alphabet is that AG rule is no longer com- 
plete for deterministic finite automata [18]. The problem still exists for timed 
automata. If Xs C X4, then there might not exist an assumption A, over Xs 
that satisfies the two premises of AG even though M{ || M} H| g. In this situation, 
we say X's is incomplete and needs to be refined. So each time when we find Xs is 
incomplete, we select another ©” C X4 and restart the learning algorithm again. 
If a large number of round of refinement is needed, the speed of the verification 
is reduced significantly. To compensate for this speed reduction, we reuse the 
counterexamples that indicate the incompleteness of X, in the previous loops 
and use a variable List, to store them. Before starting a new round of learning, 
we use List. to judge whether the current ©”, is appropriate in advance. We 
say X’, is appropriately selected only if all the counterexamples of List, can not 
indicate X% is incomplete. 

With a small alphabet X, C X4, we can not directly conclude the veri- 
fication result if Mi || M, jÆ ¢’. The reason is that any given counterexample 
ctx maintaining Mil| Acts KF Q A cta| 5, € L(M3)ls, will be used to illustrate 
the incompleteness of the Xs, though in some cases ctx indeed indicates that 
Mi || M} K ¢’ over X4. As a result, the treatment of ctas will decrease the whole 
verification speed if M{|| M3 |K ¢’. To solve this, we need to detect real counterex- 
amples earlier. We will first check whether Mi || Acts E Q A ctz] s, E £(MS)\ 5, 
holds. If the result is yes, the verification concludes M{||M3 + ¢’. Otherwise ctx 
is used to refine assumption over new 7%. 


5 Experimental Results 


We implemented the proposed framework in Java. The membership queries and 
candidate queries are executed by calling the model checking tool UPPAAL. We 
evaluated the implementation on the benchmark of AUTOSAR (Automotive 
Open System Architecture) case studies. All the experiments were carried out 
on a 3.7GHz AMD Ryzen 5 5600X processor with 16GB RAM running 64-bit 
Windows 10. The source code of our tool and experiments is available in [2]. 
AUTOSAR is an open and standardized software architecture for automotive 
ECUs (Electronic Control Units). It consists of three layers, from top to bot- 
tom: AUTOSAR Software, AUTOSAR Runtime Environment (RTE), and Basic 
Software [1]. Its safety guarantee is very important [13,34,40]. A formal timed 
model of AUTOSAR architecture consists of several tasks and their correspond- 
ing runnables, different communication mechanisms of any two runnables, RTE 
communication controllers and task schedulers. In terms of different number of 
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tasks and runnables, we designed three kinds of composed models: the small- 
scale model AUTOSAR-1 (8 automata), the complex-scale composed models 
AUTOSAR-2 (14 automata) and AUTOSAR-3 (14 automata). The properties 
of the architecture to be checked are: 1) buffers between two runnables will never 
overflow or underflow, and 2) for a pair of sender runnable and receiver runnable, 
they should not execute the write action simultaneously. The checking methods 
we performed in the experiments are: 1) traditional monolithic model check- 
ing via UPPAAL, 2) compositional verification framework we propose (CV), 3) 
CV with the first improvement that uses additional information of M} and ¢’ 
(CV+A), 4) CV with the second improvement that minimizes assumption alpha- 
bet (CV+M), and 5) CV with both improvements (CV+A+M). Each experiment 
was conducted five times to calculate the average verification time. Tables 1-4 
show the detailed verification results for each property using these methods, 
where Case IDs are given in the format n-m-k-l, denoting respectively the iden- 
tifiers of the verified properties, the number of locations and clocks of M2, and 
the alphabet size of Mz. The Boolean variable Valid denotes whether the prop- 
erty is satisfied. The symbols |Q], |X|, R, and Tinean stand for the number of the 
locations and the alphabet size of the learned assumption, the number of alpha- 
bet refinements during learning and the average verification time in seconds, 
respectively. 


1) AUTOSAR-1 Experiment. AUTOSAR-1 consists of 8 timed automata: 
4 runnables, 2 buffers, and 2 schedulers used for scheduling the runnables. We 
partition the system into two parts, where Mı isa DOTA and Mg is composed of 
7 DOTAs. The experimental results for this case are recorded in Table 1, where 
the proposed compositional verification (CV) outperforms the monolithic check- 
ing via UPPAAL except for cases 1-71424-7-8 and 3-71424-7-8 . This is because, 
for these two cases, the learning algorithm needs more than 30 rounds to refine 
assumptions using generated counterexamples. However, in terms of the first 
improvement (CV+A), i.e. CV with additional information of M}, the verifica- 
tion time reduces drastically for these two cases. Similarly, by the use of the 
second improvement (CV+M), i.e. CV with a minimized alphabet, the verifica- 
tion time decreases due to fewer membership queries. With both improvements 
(CV+A+M), compared with single ones, the checking time varies depending on 
the actual case. As shown in Table1, in the case of checking property 1 with 
CV-+A, since the alphabet size of the learned assumption A is the largest one, 
i.e. 3, the second improvement can take effect. So the verification time using 
CV+A+M is less than that using CV+A. However, it is worse than CV+M. 
We have discussed in Sect.3.5 that the framework can handle models for 
Mı which might be a multi-clock timed automaton, though termination is not 
guaranteed. So, we also repartition the AUTOSAR-1 system into two parts for 
verification, where Mı is composed of 7 DOTAs. The results in Table2 reveal 
that the proposed compositional method outperforms UPPAAL in most of the 
cases except the case 5-4-1-2. The reason is that UPPAAL might find a coun- 
terexample faster than the compositional approach because of the on-the-fly 
technique, which terminates the verification once a counterexample is found. 
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Table 1. Verification Results for AUTOSAR-1 where Mı is a DOTA. 


Case ID Valid | UPPAAL | CV CV+A CV+M CV+A+M 

Tmean ATEI Tasan NAA | Tacan Q ||| R) Tacan IQ | |X| R| Thean 
1-71424-7-8 Yes 37.862 62 |3 |1091.419|5 3 |13.864 3 2 |3 |4.676 3 |2 3 |5.896 
2-71424-7-8 | Yes | 46.215 1 3 0.237 4 3 11.030 | 1 1 0 | 0.163 | 1 1 0 | 0.273 
3-71424-7-8| Yes 38.947 |62 |3 995.148 |3 |3 6.353 2 |2 (1 /2.723/2 |2 |1 [3.280 
4-71424-7-8 | Yes | 38.783 1 3 0.234 2 3 3.859 | 1 1 0 | 0.164 1 1 0 | 0.341 


In contrast, our framework needs to spend some time learning the assumption 
ahead of searching the counterexample, resulting in more time for the termina- 
tion of the verification framework. In the experiments, we also observe that the 
time varies with the selection of M,. Therefore, a proper selection of the com- 
ponents composed as Mı or Mə can lead to a faster verification, while ensuring 
termination of the framework. 


Table 2. Verification Results for AUTOSAR-1 where Mı is a composition of DOTAs 


Case ID | Valid | UPPAAL | CV CV+A CV+M CV+A+M 

Tinean |Q| E| | Zmean | [Q| | |X| | Tmean ||Q| | |X| | R | Tmean ||Q| | |X| | R | Tmean 
1-4-1-2 | Yes | 37.862 2 2 10.117 | 2 2 9.722 |2 2 2 | 12.989 | 2 2 2 | 12.681 
2-4-1-2 | Yes | 46.215 1 2 12.298 | 1 2 9.316 |1 1 0 | 11.900} 1 1 0 | 12.022 
3-4-1-2 | Yes | 38.947 1 2 12.208 | 1 2 9.391 |1 1 0 | 11.897} 1 1 0 | 11.941 
4-4-1-2 | Yes | 38.783 1 2 12.195} 1 2 9.237 |1 1 0 | 11.932) 1 1 0 | 12.013 
5-4-1-2 | No 0.394 3 2 6.252 | 2 2 2.975 |3 2 2 | 12.626 | 2 2 2 | 9.529 
6-4-1-2 Yes | 38.319 3 2 23.973 | 1 2 13.480 | 3 2 1 | 33.569 | 1 2 1 | 22.563 


2) AUTOSAR-2 Experiment. AUTOSAR-2 is a more complex system with 
totally 14 automata, including 6 runnables and a task to which the runnables 
are mapped, 5 buffers, a RTE and a scheduler. In this experiment, we select Mı 
as a composition of several DOTAs. The results in Table 3 show that in the cases 
of properties 1-4, UPPAAL fails to obtain checking results due to the large state 
space, whereas our compositional approach can finish the verification for all the 
properties in 300 seconds using the same memory size. This indicates that the 
framework can reduce the state space significantly in some cases. 


Table 3. Verification Results for AUTOSAR-2 


Case ID | Valid | UPPAAL | CV CV+A CV+M CV+A+M 

Tmean IQI |Z] | Zmean | {Q| |||] Zmean | {Q|||2|)R|Tmean | |Q| |||] 2 | Tmean 
1-4-1-2 | Yes | ROM 1 2 295.342 | 1 2 263.082 | 1 1 0 | 291.945 | 1 1 0 | 292.945 
2-4-1-2 | Yes | ROM L 2 298.551 |1 2 265.381 | 1 1 0 | 293.617 |1 L 0 | 290.617 
3-4-1-2 | Yes | ROM 1 2 295.443 | 1 2 264.900 | 1 1 0 | 292.244 | 1 1 0 | 291.244 
4-4-1-2 | Yes | ROM 1 2 295.688 | 1 2 271.144 | 1 1 0 | 294.194 | 1 1 0 | 295.194 


ROM: run out of memory. 
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3) AUTOSAR-3 Experiment. The system consists of 14 components, where 
both Mı and Mə are the compositions of several DOTAs. The checking results 
shown in Table4 illustrate that the minimal alphabet improvement can obtain 
the smallest alphabet with size 1, thus reducing the verification time. However, 
the additional information improvement performs badly in most cases. 


Table 4. Verification Results for AUTOSAR-3 


Case ID | Valid | UPPAAL | CV CV+A CV+M CV+A+M 

Tacan IQI | |X| | Tinean | |Q] | |2| | Tmean | {Q] | |2|] R| Tmean | |Q] | |X| R| Tnean 
1-30-1-7 Yes | 1.354 1 /3 0910/1 |3 |0.298]1 Jı lolo.sos|ı 1 Jo | 0.801 
2-30-1-7 | Yes 1.313 1 6 0.351 |3 6 2.839 | 1 1 0 |0.152 |1 1 0 | 0.150 
3-30-1-7 Yes | 1.363 1 le (0348/3 le |2938]1 Jı [0 ]0.161|1 |1 Jo 0.156 


6 Conclusion 


Though in model checking, assume-guarantee reasoning can help alleviate state 
space explosion problem of a composite model, its practical impact has been lim- 
ited due to the non-trivial human interaction to obtain the assumption. In this 
paper, we propose a learning-based compositional verification for deterministic 
timed automata, where the assumption is learned as a deterministic one-clock 
timed automaton. We design a model conversion algorithm to acquire the clock 
reset information of the learned assumption to reduce the learning complexity 
and prove this conversion preserves the verification results. To make the frame- 
work applicable to multi-clock systems, we design a smart teacher with heuristic 
to answer clock reset information. We also prove the correctness and termina- 
tion of the framework. To speed up the verification, we further give two kinds 
of improvements to the learning process. We implemented the framework and 
performed experiments to evaluate our method. The results show that it outper- 
forms monolithic model checking, and the state space can be effectively reduced. 
Moreover, the improvements also have positive effects on most studied systems. 
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Abstract. Online monitoring is an effective validation approach for 
hybrid systems, that, at runtime, checks whether the (partial) signals of a 
system satisfy a specification in, e.g., Signal Temporal Logic (STL). The 
classic STL monitoring is performed by computing a robustness interval 
that specifies, at each instant, how far the monitored signals are from 
violating and satisfying the specification. However, since a robustness 
interval monotonically shrinks during monitoring, classic online moni- 
tors may fail in reporting new violations or in precisely describing the 
system evolution at the current instant. In this paper, we tackle these 
issues by considering the causation of violation or satisfaction, instead 
of directly using the robustness. We first introduce a Boolean causation 
monitor that decides whether each instant is relevant to the violation or 
satisfaction of the specification. We then extend this monitor to a quan- 
titative causation monitor that tells how far an instant is from being 
relevant to the violation or satisfaction. We further show that classic 
monitors can be derived from our proposed ones. Experimental results 
show that the two proposed monitors are able to provide more detailed 
information about system evolution, without requiring a significantly 
higher monitoring cost. 


Keywords: online monitoring - Signal Temporal Logic - monotonicity 


1 Introduction 


Safety-critical systems require strong correctness guarantees. Due to the com- 
plexity of these systems, offline verification may not be able to guarantee their 
total correctness, as it is often very difficult to assess all possible system behav- 
iors. To mitigate this issue, runtime verification [4,29,36] has been proposed as a 
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complementary technique that analyzes the system execution at runtime. Online 
monitoring is such an approach that checks whether the system execution (e.g., 
given in terms of signals) satisfies or violates a system specification specified in 
a temporal logic [28,34], e.g., Signal Temporal Logic (STL) [830]. 

Quantitative online monitoring is based on the STL robust semantics [17,21] 
that not only tells whether a signal satisfies or violates a specification y (i.e., the 
classic Boolean satisfaction relation), but also assigns a value in R U {00, —oo} 
(i.e., robustness) that indicates how robustly ọ is satisfied or violated. However, 
differently from offline assessment of STL formulas, an online monitor needs to 
reason on partial signals and, so, the assessment of the robustness should be 
adapted. We consider an established approach [12] employed by classic online 
monitors (ClaM in the following). It consists in computing, instead of a single 
robustness value, a robustness interval; at each monitoring step, ClaM identifies 
an upper bound [R]” telling the maximal reachable robustness of any possible 
suffix signal (i.e., any continuation of the system evolution), and a lower bound 
[R]" telling the minimal reachable robustness. If, at some instant, [R]” becomes 
negative, the specification is violated; if [R]" becomes positive, the specification 
is satisfied. In the other cases, the specification validity is unknown. 

Consider a simple example in Fig. 1. PEM 
It shows the monitoring of the speed of ae iy fa 1st vio. ep.» i 
a vehicle (in the upper plot); the speci-  » a a 
fication requires the speed to be always 2 
below 10. The lower plot reports how the ee TN any A 


classic online monitor ClaM 


upper bound [R]” and the lower bound “ios 
8 


[R]' of the reachable robustness change SSeS in 
over time. We observe that the initial |} 9 P PARI 3 44 [RU 


value of [R]} is around 8 and gradually IRI 
decreases.! The monitor allows to detect aca alae alle ý 
that the specification is violated at time 
b = 20 when the speed becomes higher Fig. 1. ClaM — Robustness upper and 
than 10, and therefore [R] goes below 0. lower bounds of (0,100}(¥ < 10) 
After that, the violation severity progres- 
sively gets worse till time b = 30, when (RI? becomes —5. After that point, the 
monitor does not provide any additional useful information about the system 
evolution, as [R]} remains stuck at —5. However, if we observe the signal of the 
speed after b = 30, we notice that (i) the severity of the violation is mitigated, 
and the “lst violation episode” ends at time b = 35; however, the monitor 
ClaM does not report this type of information; (ii) a “2nd violation episode” 
occurs in the time interval [40,45]; the monitor ClaM does not distinguish the 
new violation. 

The reason for the issues reported in the example is that the upper and lower 
bounds are monotonically decreasing and increasing; this has the consequence 


> b (time) 


1 The value of lower bound [R]" is not shown in the figure, as not relevant. In the 
example, it remains constant before b = 100, and the value is usually set either 
according to domain knowledge about system signals, or to —oo otherwise. 
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that the robustness interval at a given step is “masked” by the history of previous 
robustness intervals, and, e.g., it is not possible to detect mitigation of the viola- 
tion severity. Moreover, as an extreme consequence, as soon as the monitor ClaM 
assesses the violation of the specification (i.c., the upper bound [R]” becomes 
negative), or its satisfaction (i.e., the lower bound [R]! becomes positive), the 
Boolean status of the monitor does not change anymore. Such characteristic 
directly derives from the STL semantics and it is known as the monotonicity |9- 
11] of classic online monitors. Monotonicity has been recognized as a problem 
of these monitors in the literature [10,37,40], since it does not allow to detect 
specific types of information that are “masked”. We informally define two types 
of information masking that can occur because of monotonicity: 


evolution masking: the monitor may not properly report the evolution of the 
system execution, e.g., mitigation of violation severity may not be detected; 

violation masking: as a special case of evolution masking, the first violation 
episode during the system execution “masks” the following ones. 


The information not reported by ClaM because of information masking, is 
very useful in several contexts. First of all, in some systems, the first violation of 
the specification does not mean that the system is not operating anymore, and 
one may want to continue monitoring and detect all the succeeding violations; 
this is the case, e.g., of the monitoring approach reported by Selyunin et al. [37] in 
which all the violations of the SENT protocol must be detected. Moreover, having 
a precise description of the system evolution is important for the usefulness of 
the monitoring; for example, the monitoring of the speed in Fig. 1 could be used 
in a vehicle for checking the speed and notifying the driver whenever the speed 
is approaching the critical limit; if the monitor is not able to precisely capture 
the severity of violation, it cannot be used for this type of application. 

Some works [10,37,40] try to mitigate the monotonicity issues, by “resetting” 

the monitor at specific points. A recent approach has been proposed by Zhang 
et al. [40] (called ResM in the following) that is able to identify each “violation 
episode” (i.e., it solves the problem of violation masking), but does not solve 
the evolution masking problem. For the example in Fig. 1, ResM is able to detect 
the two violation episodes in intervals [20,35] and [40,45], but it is not able to 
report that the speed decreases after b = 10 (in a non-violating situation), and 
that the severity of the violation is mitigated after b = 30. 
Contribution. In this paper, in order to provide more information about the 
evolution of the monitored system, we propose to monitor the causation of viola- 
tion or satisfaction, instead of considering the robustness directly. To do this, we 
rely on the notion of epoch |5]. At each instant, the violation (satisfaction) epoch 
identifies the time instants at which the evaluation of the atomic propositions of 
the specification y causes the violation (satisfaction) of y. 

Based on the notion of epoch, we define a Boolean causation monitor (called 
BCauM) that, at runtime, not only assesses the specification violation/satisfaction, 
but also tells whether each instant is relevant to it. Namely, BCauM marks each 
current instant b as (i) a violation causation instant, if b is added to the violation 
epoch; (ii) a satisfaction causation instant, if b is added to the satisfaction epoch; 
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(iii) an irrelevant instant, if b is not added to any epoch. We show that BCauM is 
able to detect all the violation episodes (so solving the violation masking issue), 
as violation causation instants can be followed by irrelevant instants. Moreover, 
we show that the information provided by the classic Boolean online monitor 
can be derived from that of the Boolean causation monitor BCauM. 

However, BCauM just tells us whether the current instant is a (violation or 
satisfaction) causation instant or not, but does not report how far the instant is 
from being a causation instant. To this aim, we introduce the notion of causation 
distance, as a quantitative measure characterizing the spatial distance of the 
signal value at b from turning b into a causation instant. Then, we propose 
the quantitative causation monitor (QCauM) that, at each instant, returns its 
causation distance. We show that using QCauM, besides solving the violation 
masking problem, we also solve the evolution masking problem. Moreover, we 
show that we can derive from QCauM both the classic quantitative monitor ClaM, 
and the Boolean causation monitor BCauM. 

Experimental results show that the proposed monitors, not only provide more 
information, but they do it in an efficient way, not requiring a significant addi- 
tional monitoring time w.r.t. the existing monitors. 


Outline. Section 2 reports necessary background. We introduce BCauM in Sect. 
3, and QCauM in Sect. 4. Experimental assessment of the two proposed monitors 
is reported in Sect. 5. Finally, Sect. 6 discusses some related work, and Sect. 7 
concludes the paper. 


2 Preliminaries 


In this section, we review the fundamentals of signal temporal logic (STL) in 
Sect. 2.1, and then introduce the existing classic online monitoring approach in 
Sect. 2.2. 


2.1 Signal Temporal Logic 


Let T € R+ bea positive real, and d € N4 be a positive integer. A d-dimensional 
signal is a function v: [0,T] — Rĉ, where T is called the time horizon of v. 
Given an arbitrary time instant t € [0, T], v(t) is a d-dimensional real vector; 
each dimension concerns a signal variable that has a certain physical meaning, 
e.g., speed, RPM, acceleration, etc. In this paper, we fix a set Var of variables 
and assume that a signal v is spatially bounded, i.e., for all t € [0, T], v(t) € Q, 
where (2 is a d-dimensional hyper-rectangle. 

Signal temporal logic (STL) is a widely-adopted specification language, used 
to describe the expected behavior of systems. In Definition 1 and Definition 2, 
we respectively review the syntax and the robust semantics of STL [17,21,30]. 


Definition 1 (STL syntax). In STL, the atomic propositions a and the for- 
mulas p are defined as follows: 


a: = f(wi,...,wx)>0 ¢:=a|L|7y|p^y]| Ory | rp | eure 
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Here f is a K-ary function f : R — R, w1,...,wg € Var, and I is a closed 
interval over Ryo, i.e., I = [l,u], where l,u € R and / < u. In the case that 
l = u, we can use l to stand for J. O,© and U are temporal operators, which 
are known as always, eventually and until, respectively. The always operator O 
and eventually operator © are two special cases of the until operator U, where 
Ory = T Ury and Ory = ~ro. Other common connectives such as V, — are 
introduced as syntactic sugar: p1 V p2 = 7(791 A ny2), Y1 > Y2 = 791 V 2. 


Definition 2 (STL robust semantics). Let v be a signal, y be an STL for- 
mula and 7 € R, be an instant. The robustness R(v, y,7) E€ RU {00, —oo} of v 
w.r.t. y at T is defined by induction on the construction of formulas, as follows. 


R(v,a,T) := f(v(7)) R(v, 1,7) := —co R(v, 79,7) := —R(v, 9,7) 


R(v, pı A p2,T) = min (R(v, %1; T), R(v, p2, T)) 


R(v, 1;T) = inf R(v, 9, t) R(v, O1y,7) := sup R(v, 9, t) 
tert tert 


R(v, Pı Ur P2, T) := sup min (ro, p2, t), inf R(v, P1, r) 
tert t'elr,t) 


Here, T + I denotes the interval [l + 7, u + 7]. 


The original STL semantics is Boolean, which represents whether a signal 
v satisfies y at an instant 7, i.e., whether (v,r) H y. The robust semantics 
in Definition 2 is a quantitative extension that refines the original Boolean STL 
semantics, in the sense that, R(v, y, 7) > 0 implies (v, 7) = y, and R(v, p, T) < 0 
implies (v, T) Æ y. More details can be found in [21, Proposition 16]. 


2.2 Classic Online Monitoring of STL 


STL robust semantics in Definition 2 provides an offline monitoring approach 
for complete signals. Online monitoring, instead, targets a growing partial signal 
at runtime. Besides the verdicts T and L, an online monitor can also report the 
verdict unknown (denoted as ?), which represents a status when the satisfaction 
of the signal to ọ is not decided yet. In the following, we formally define partial 
signals and introduce online monitors for STL. 


Let T be the time horizon of a signal v, and let [a,b] C [0,7] be a sub- 
interval in the time domain [0,7]. A partial signal Va» is a function which is 
only defined in the interval [a, b]; in the remaining domain [0, T]\ [a, b], we denote 
that Va:» = €, where € stands for a value that is not defined. 

Specifically, if a = 0 and b € (a,T], a partial signal va: is called a prefix 
(partial) signal; dually, if b = T and a € (0,6), a partial signal Va: is called a 
suffix (partial) signal. Given a prefix signal vo.y, a completion Vo.p + Va:T Of Vo:» 
is defined as the concatenation of vo: with a suffix signal vẹ:r. 


Definition 3 (Classic Boolean STL online monitor). Let vo. be a prefix 
signal, and y be an STL formula. An online monitor M(vo-,,y,7) returns a 
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verdict in {T,L,?} (namely, true, false, and unknown), as follows: 


if Yvart. R(Vo:b ' Var, p, T) > 0 
M(vo:, p, T) = tL. af Vva:T. R(Vo:» * VET, P; T) <0 


? otherwise 


Namely, the verdicts of M(vo:», Y, T) are interpreted as follows: 


— if any possible completion vo.» - Vb:T of vo.» satisfies y, then vo-» satisfies y; 

— if any possible completion Vo: - Va:r of Vo., violates vy, then vo., violates yp; 

— otherwise (i.e., there is a completion vo.» - vo:r that satisfies y, and there is 
a completion Vo:»: Var that violates vy), then M(vo.5, Y, T) reports unknown. 


Note that, by Definition 3 only, we cannot synthesize a feasible online moni- 
tor, because the possible completions for vo., are infinitely many. A constructive 
online monitor is introduced in [12], which implements the functionality of Def- 
inition 3 by computing the reachable robustness of vo:». We review this monitor 
in Definition 4. 


Definition 4 (Classic Quantitative STL online monitor (ClaM)). Let vo.» 
be a prefix signal, and let y be an STL formula. We denote by Rax and R@,,, the 
possible maximum and minimum bounds of the robustness R(v,a,7)*. Then, an 
online monitor [R](vo:», p, T), which returns a sub-interval of [R@,,,,R@,,] at the 


min? “max 


instant b, is defined as follows, by induction on the construction of formulas. 


[f (vow(7)) f (vow(r))] ifr € [0,8] 


R|(Vo:b;, @, — 

bye) Heese otherwise 

R](Vo., 7¥, 7) = —[R](vo:0, 9,7) 

R(vo., p1 A p27) = min ([R](vo:s, 1,7), [Rl(v0s, 92.7) 
R\(voo,O1y,7) = inf ([BI(vow,,8)) 

R](vo er Ur 2,7) = sup min ([R](vos. 2,2), inf IR] (vow, ert’) 


tert 


Here, f is defined as in Definition 1, and the arithmetic rules over inter- 
vals I = [l,u] are defined as follows: —I := [-u,—Il] and min({,I2) := 
[{min(l,, l2), min(w1, u2)]. 


We denote by [R]" (vo, 9, 7) and [R] (vow, ¥, 7) the upper bound and the 
lower bound of [R](vo:», Y, T) respectively. Intuitively, the two bounds together 
form the reachable robustness interval of the completion vo., - Vp:7, under any 
possible suffix signal vp:r. For instance, in Fig. 2, the upper bound RJ} at b = 20 
is 0, which indicates that the robustness of the completion of the signal speed, 
under any suffix, can never be larger than 0. 

The quantitative online monitor ClaM in Definition 4 refines the Boolean one 
in Definition 3, and the Boolean monitor can be derived from ClaM as follows: 


2 R(v,a,7) is bounded because v is bounded by 2. In practice, if 2 is not know, we 
set Rúax and Rin to, respectively, co and —oo. 
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— if [R] (vo:», 9, T) > 0, it implies that M(vo:», 9,7) = T; 
— if [R] (vo:», %, 7) < 0, it implies that M(vow, 9,7) = L; 
— otherwise, if [R] (vo:», p, T) < 0 and [R]}(vo:», p, T) > 0, M(vos, 9,7) = ?. 


The classic online monitors are monotonic by definition. In the Boolean mon- 
itor (Definition 3), with the growth of vo:», M(vo:b, p, T) can only turn from ? to 
{L, T}, but never the other way around. In the quantitative one (Definition 4), as 
shown in Lemma 1, [R]" (vo:», p, T) and [R]}(vo:», p, T) are both monotonic, the 
former one decreasingly, the latter one increasingly. An example can be observed 
in Fig. 2. 

Lemma 1 (Monotonicity of STL online monitor). Let [R](vo:», p, T) be 
the quantitative online monitor for a partial signal vo.» and an STL formula 
p. With the growth of the partial signal Vo:p, the upper bound [R]"(vo:», 9,7) 
monotonically decreases, and the lower bound [R]E(vo:», p, T) monotonically 
increases, i.e., for two time instants b1,b2 € [0,T], if bı < be, we have (i) 
[R]" (Vov, P, T) 2 [R]" (vob. DP) T), and (ii) [RI"(vo., DP T) < [R] (v0.0. DP T). 

Proof. This can be proved by induction on the structures of STL formulas. The 
detailed proof can be found in the full version [38]. 


3 Boolean Causation Online Monitor 


As explained in Sect.1, monotonicity of classic online monitors causes differ- 
ent types of information masking, which prevents some information from being 
delivered. In this section, we introduce a novel Boolean causation (online) mon- 
itor BCauM, that solves the violation masking issue (see Sect. 1). BCauM is defined 
based on online signal diagnostics [5,40], which reports the cause of violation or 
satisfaction of the specification at the atomic proposition level. 


Definition 5 (Online signal diagnostics). Let vo: be a partial signal and » 
be an STL specification. At an instant b, online signal diagnostics returns a vio- 
lation epoch E?(vo.»,,7), under the condition [RI (vo:», p, T) <0, as follows: 


{(a,7)} if [R]°(vos,a,7) < 0 
(i) otherwise 


ES (vop, @&, T) = i 


ES (vob, n9, T) := E® (vob, p, T) 
ES (vVo:b, p1 A P2,7) := U ES (v0.5; Pi; T) 
iE{1,2} s.t. 

[R]" (vo:b»2i;7)<0 
E (Vo:b; 19,T) = U ES (vo:», 9, t) 

tET+I s.t. 

[R]" (vo:b:9,t)<0 

E° (vow, pi Ur p2;,T) := U E? (vo p2 t)U LJ E(vow, v1’) 


tert s.t. t'elr,t) 
[R]" (vo: P1Utp2;7)<0 
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and a satisfaction epoch E® (vo:», p, T), under the condition IR] (vos, y,T) >0, 
as follows: 


{(a,7)} if [R] (vov, @,7) > 0 
f) otherwise 


EY (vo:5,0,7) := i 


EY (Vo:b, 71; T) = E® (Vo:b, P, T) 


E*(vo.b, p1 A 2,7) = U E® (v0.5, pi; T) 
1€ {1,2} s.t. 
[R]'(vo.b,~i.7)>0 
E~ (Vo:b, IP, T) := U E® (Vo:b, P, t) 


tET+I s.t. 
[R]'(vo.n,9,t) >0 


E® (vo:», P1 Ur 2,7) = U E®(vo.5, G2, t) U U E®(vo.5, 91,0’) 
tert s.t. t'E[7,t) 
[R]'(vo.,~1Utp2,7) >0 

If the conditions are not satisfied, ES(vo.,, Y, T) and E®(vo.., Y, T) are both 0. 

Note that the definition is recursive, thus the conditions should also be checked 

for computing the violation and satisfaction epochs of the sub-formulas of y. 
Computation for other operators can be inferred by the presented ones and 

the STL syntax (Definition 1). 


Intuitively, when a partial signal vo., violates a specification y, a violation 
epoch starts collecting the evaluations (identified by pairs of atomic propositions 
and instants) of the signal at the atomic proposition level, that cause the viola- 
tion of the whole formula y (which also applies to the satisfaction cases in a dual 
manner). This is done inductively, based on the semantics of different operators: 


— in the case of an atomic proposition a, if a is violated at 7, it collects (a, 7); 

— in the case of a negation ~g, it collects the satisfaction epoch of y; 

— in the case of a conjunction yi A Ye, it collects the union of the violation 
epochs of the sub-formulas violated by the partial signal; 

— in the case of an always operator Ozo, it collects the epochs of the sub-formula 
y at all the instants t where ọ is evaluated as being violated. 

— in the case of an until operator yi Ur Ye, it collects the epochs of the sub- 
formula yə at all the instants t and the epochs of ọı at the instants t € [r, t), 
in the case where the clause “pı until p2” is violated at t. 


Example 1. The example in Fig. 2 illustrates how an epoch is collected. The 
specification requires that whenever the speed is higher than 10, the car should 
decelerate within 5 time units. As shown by the classic monitor, the specification 
is violated at b = 25, since v becomes higher than 10 at 20 but a remains positive 
during [20, 25]. Note that the specification can be rewritten as y = O)9,199)(3(v > 
10) V ©10,5)(a < 0)). For convenience, we name the sub-formulas of ọ as follows: 


gi =-(v>10)VOps(@<0) gr=-(v>10)  92=%,5)(a < 0) 
a, =v>10 a2=a<0 
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v (speed) <= = 
T v (speed) b=30 v (speed) b=35 
: Soar ae. 
10 
>p 10 ag 10 eae 
5 10 15 20 25 30 35 
c i > > 
ae 3 10 15 20 25 30°° s 10 15 20 25 30 35°? 
= ` i 
3 es a (acceleration) a (acceleration) 
0 A 
{ | | 
5 — — =; = (| 
-10 >} Í Io { |o = 
5 10 15 20 25 30 35 0 | i 0 |---t | 
-10 >p -10 >h 
classic Boolean online monitor 5 10 15 20 25 30 5 10 15 20 25 30 35 
a 
T 
? 5 10 15 20 15 30 35 b 
ral | Fig. 3. The violation epochs (the red parts) 
k i 
a ; respectively when b = 30 and b = 35 
classic quantitative online monitor ClaM 
10 omen 
~= Boolean causation monitor BCauM 
»b 
5 10 ia NG [R]" @ 


-10 


[R]: 


5 30 35 b (time) 


Fig. 2. Classic monitor (ClaM) 
result for the STL specification: 
(0,100}(v > 10 + j0,5)(@ < 0)) 


Fig. 4. Boolean causation monitor (BCauM) result 


Figure 3 shows the violation epochs at two instants 30 and 35. First, at b = 30, 


E°(vo:30, 9, 0) = (Use (20,25) E®(vo;30, 01, t)) U (Use 720,30) E°(vo;30, @2, t)) 
= (a1, [20, 25]) U (a2, [20, 30]) 


Similarly, the violation epoch E°(vo.35, (2,0) at b = 35 is the same as that at 
b = 30. Intuitively, the epoch at b = 30 shows the cause of the violation of vo:30; 
then since signal a < 0 in [30,35], this segment is not considered as the cause of 
the violation, so the epoch remains the same at b = 35. < 


Definition 6 (Boolean causation monitor (BCauM)). Let vo.) be a partial 
signal and y be an STL specification. We denote by A the set of atomic propo- 
sitions of y. At each instant b, a Boolean causation (online) monitor BCauM 
returns a verdict in {9, 9, } (called violation causation, satisfaction causation 
and irrelevant), which is defined as follows, 


© if dae A. (a,b) € EO (vos, 9,7) 
M (Nob PT) = <@ if Ja E€ A. (a,b) € E (von, 9,7) 
@ otherwise 


An instant b is called a violation/satisfaction causation instant if M (Vo:», p, T) 
returns ©/9, or an irrelevant instant if M (Vo:», p, T) returns ©. 


Intuitively, if the current instant b (with the related a) is included in the epoch 
(thus the signal value at b is relevant to the violation/satisfaction of p), BCauM will 
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report a violation/satisfaction causation (©/®); otherwise, it will report irrelevant 
(@). Notably BCauM is non-monotonic, in that even if it reports © or @ at some 
instant b, it may still report © after b. This feature allows BCauM to bring more 
information, e.g., it can detect the end of a violation episode and the start of anew 
one (i.e., it solves the violation masking issue in Sect. 1); see Example 2. 


Example 2. Based on the signal diagnostics in Fig. 3, the Boolean causation 
monitor BCauM reports the result shown as in Fig. 4. 

Compared to the classic Boolean monitor in Fig. 2, BCauM brings more infor- 
mation, in the sense that it detects the end of the violation episode at b = 30, 
by going from © to @, when the signal a becomes negative. < 


Theorem 1 states the relation of BCauM with the classic Boolean online monitor. 


Theorem 1. The Boolean causation monitor BCauM in Definition 6 refines the 
classic Boolean online monitor in Definition 3, in the following sense: 


= M(vo:0, P, T) = = iff. V te[0,0] (M (vost, P, T) = ©) 
-= M(vo.2, 9,7) = if. Vielo, (M (Vot: 9,7) = ®) 
T M(Vo:b, P, T) =? iff. Neelo, (M (Vot P, T) = ) 


Proof. The proof is based on Definitions 5 and 6, Lemma 1 about the monotonic- 
ity of classic STL online monitors, and two extra lemmas in the full version [38]. 


4 Quantitative Causation Online Monitor 


Although BCauM in Sect. 3 is able to solve the violation masking issue, it still 
does not provide enough information about the evolution of the system signals, 
i.e., it does not solve the evolution masking issue introduced in Sect. 1. To tackle 
this issue, we propose a quantitative (online) causation monitor QCauM in Defi- 
nition 7, which is a quantitative extension of BCauM. Given a partial signal vo-», 
QCauM reports a violation causation distance [Z]® (vo:», p, T) and a satisfaction 
causation distance Kii (Vo:b; P, T), which, respectively, indicate how far the sig- 
nal value at the current instant b is from turning b into a violation causation 
instant and from turning b into a satisfaction causation instant. 


Definition 7 (Quantitative causation monitor (QCauM)). Let vo: be a 
partial signal, and y be an STL specification. At instant b, the quantitative 
causation monitor QCauM returns a violation causation distance [Z]? (vo:», 9, T), 
as follows: 


AJ? (vota, T) := ak ifb=r 


Roax otherwise 
B\° (vos 7Y,T) = -|2\° (Vot P, T) 

AJE (vo, 1 A P2;,T) := min ([2]° (vos, 91,7) [2]? (vos, 92:7)) 

max ((|° (vo, 91,7) , [R] (vo: 2,7) > 
max ([R]" (vo:s, 91,7), [2]® (vo #257) 


ay (Vo:0, 91 V P2; T) := min 
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[2° (vow, Or¢,7) := int (121° (vow, 9,4)) 


tert 
(A|° (vow, Org,7) := inf (max (12° (vo 9,4) [RI (vos, Or9.7))) 
a [AI° (vow, 1, #’) 
. min #/E[r,t 
[Z|° (Vo:b; p1 Ur P2,T) := ae max Zii (Vo:b; P2, t) 


[R]" (vo, p1 Ur 2,7) 


and a satisfaction causation distance [2° (vo:», p, T), as follows: 


A|” (Vo, 0,7) := oe ifb= 3 
Rmin otherwise 
BY (vom 7,7) = [B® (vos, p7) 


min ([2]° (vos, 91,7), R] (von: 92:7) ) ; 


A|” (Vo:b, P1 A P2, T) = max , 
min [R] (vo:b, P1;T), [Z| (Vo:b; P2, r)) 


AIÈ (von er V p217) = max ([#]° (vos, v1.7) [2]? (vow, e257) 


AI“ (vov, Orp, T) := sup (min (121° (Vo:b, 9; t), [R] (vos, TY; r))) 


ZA x (Vo:b, Ory, T) = sup (121° (Vo:b, p, t) 


inf [R] (vo:», p1, t) 
t’€[r,t) 


[R]' (vo:», p2, t) 
[R]' (vo, 91; n) 
[Z]° (vo:0, P2, t) 


[Z] (Vo:b, p1 Ur P2, T) := sup | max 


| sup [Z]? (vow, 41,0’) 
t’/€[r,t) 


Intuitively, a violation causation distance [¥]° (vo.», 9,7) is the spatial distance 
of the signal value vo-,(b), at the current instant b, from turning b into a violation 
causation instant such that b is relevant to the violation of y (also applied to 
the satisfaction case dually). It is computed inductively on the structure of y: 


— Case atomic propositions a: if b = r (i.e., at which instant œ should be 
evaluated), then the distance of b from being a violation causation instant is 
f(vo.0(b)); otherwise, if b Æ T, despite the value of f (vo:»(b)), b can never be a 
violation causation instant, according to Definition 5, because only f(vo:»(7)) 
is relevant to the violation of a. Hence, the distance will be Rf... 

— Case ~g: bisa violation causation instant for sy if b is a satisfaction causation 
instant for p, so [Z]° (vo:», =p, T) depends on [¥]® (vo:», 9,7); 

— Case yı A v2: b is a violation causation instant for p1 A Q2 if b is a violation 
causation instant for either pı or p2, so [Z|° (Vo:b, 1 A Y2,T) depends on 
the minimum between [Z]? (vo.5, 1,7) and [Z]° (vo.s, ¢2,7); 
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quantitative causation monitor QCauM 


b (time) 
[7 


5 10 15 20° 25 


Fig. 5. Quantitative causation monitor (QCauM) result for Example 1 


— Case 1 V Ya: b is a violation causation instant for p1 V ọ2 if, first, yi V p2 
has been violated at b, and second, b is the violation causation instant for 
either pı or p2. Hence, [Z|° (Vo:b, 91 V P2, T) depend on both the violation 
status (measured by [R]"(vo:y,¢;,7)) of one sub-formula and the violation 
causation distance of the other sub-formula; 

— Case O;y: b is a violation causation instant for O;y if b is the violation 
causation instant for the sub-formula y evaluated at any instant in 7 + I. 
So, [Z is (vo.s, Orp, T) depends on the infimum of the violation causation 
distances regarding y evaluated at the instants in 7 + I; 

— Case O;y: bis a violation causation instant for O;y if, first, Ory has been 
violated at b, and second, b is a violation causation instant for the sub-formula 
y evaluated at any instant in T + I. So, [Z]° (vow, Ory, T) depends on both 
the violation status of O;y (measured by [R]" (vo, ©ry,T)) and the infimum 
of the violation causation distances of y evaluated in 7 + I. 

— Case yur Yo: [Z|° (Vo:b, 91 Ur P2, T) depends on, first, the violation status of 
the whole formula (measured by [R]° (vo, pıUrp2,T)), and also, the infimum 
of the violation causation distances regarding the evaluation of “pı holds until 


” 


p2” at each instant in 7 + I. 


Similarly, we can also compute the satisfaction causation distance. We use Exam- 
ple 3 to illustrate the quantitative causation monitor for the signals in Example 1. 


Example 3. Consider the quantitative causation monitor for the signals in 
Example 1. At b = 30, the violation causation distance is computed as: 


9 — 3 © / 
[2] (vo:30,9,0)= inf (4 (vo:30,9 t) 


max( [#]° (vo:30,91;t),[R]" (vo:30,2,t) ), 


= inf min U E 
te[0,100] max( [R] (vo:30,1;¢),[4] ~ (vo:30,02;¢) 


t/Et+ [0,5] 


R]"(v0:30,2;t), 
[Roan 0, i i ae (rasa) 


t’et+[0,5] 


= inf min 
t€[0,100] 


max( -181° (roan sup [R]”(vo.30,02,t’) 


= L U > e 
=max( — [R] (vo:30,01,25),[R] (vo:30,2,25),,, inf 17] (vo20,aa.) ) 


=max(—3,—3,—-5)=—3. 
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Similarly, at b = 35, the violation causation distance [4]° (vo:35, 9,0) = 5. 
See the result of QCauM shown in Fig. 5. Compared to ClaM in Fig. 2, it is evident 
that QCauM provides much more information about the system evolution, e.g., 
it can report that, in the interval [15,20], the system satisfies the specification 
“more”, as the speed decreases. < 


By using the violation and satisfaction causation distances reported by QCauM 
jointly, we can infer the verdict of BCauM, as indicated by Theorem 2. 


Theorem 2. The quantitative causation monitor QCauM in Definition 7 refines 
the Boolean causation monitor BCauM in Definition 6, in the sense that: 


= de [Z| (Vo:b, Y,T) < 0, it implies AM (Vo:b; P, T) = 93 
=j [2]? (Vo:b, Y,T) 2 0, it implies AM (Vo:b; Y, T) = 9; 
— if [Z]F (vo:», 9, T) > 0 and [Z]? (vov, p, T) < 0, it implies W(vo», 9,7) = Ø. 


Proof. The proof is generally based on mathematical induction. First, by Def- 
inition 7 and Definition 5, it is straightforward that Theorem 2 holds for the 
atomic propositions. 

Then, assuming that Theorem 2 holds for an arbitrary formula y, we prove 
that Theorem 2 also holds for the composite formula y’ constructed by applying 
STL operators to y. The complete proof for all three cases is shown in the full 
version [38]. 

As an instance, we show the proof for the first case with y’ = p1 V %2, i.e., 
we prove that [Z\° (Vo:b, 1 V P2, T) < 0 implies W(vo.», 1 V Y2,T) = ©. 


[Z]? (vo, 91 V 2,7) <0 


=> max (121° (Vo:b; P1; T), [RI (vor, p2, r)) <0 (by Def. 7 and w.Lo.g.) 
=>(Z\° (Vo:b, 91,7) < 0 (by def. of max) 

=M Vo», 91,T) = 9 (by assumption) 
=E®°(vo:0, 91 V p2, T) 2 EF (vob, 1,7) (by Def. 5 and Thm. 1) 
>Ja. (a,b) € ES (vo:», Y1 V Y2,T7) (by def. of D) 

=M (Vo, 91 V 92,T) =O (by Def. 6) 


The relation between the quantitative causation monitor QCauM and the 
Boolean causation monitor BCauM, disclosed by Theorem 2, can be visualized 
by the comparison between Fig. 5 and Fig. 4. Indeed, when the violation causa- 
tion distance reported by QCauM is negative in Fig.5, BCauM reports © in Fig. 4. 

Next, we present Theorem 3, which states the relation between the quanti- 
tative causation monitor QCauM and the classic quantitative monitor ClaM. 


Theorem 3. The quantitative causation monitor QCauM in Definition 7 refines 
the classic quantitative online monitor ClaM in Definition 4, in the sense that, 
the monitoring results of ClaM can be reconstructed from the results of QCauM, 
as follows: 
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[R] (von, 9,7) = int, [2]? (vost, 2.7) (1) 
[R]' (vox, P, T) = sup [a\° (Vo:t; p, T) (2) 
te[0,b] 


Proof. The proof is generally based on mathematical induction. First, by Def- 
inition 7 and Definition 4, it is straightforward that Theorem 3 holds for the 
atomic propositions. 

Then, we make the global assumption that Theorem 3 holds for an arbitrary 
formula y, i.e., both the two cases infrejo,»] Zii (Vot: pP, T) = [R]} (vo, 9, 7) 
and supy¢(o,2) [A] (Von, p, T) = [R] (vos, p, T) hold. Based on this assumption, 
we prove that Theorem 3 also holds for the composite formula y’ constructed 
by applying STL operators to y. 

As an instance, we prove inf;e(0,) [Z|° (vot, g’, T) = [RI (vo, 9’, 7) with 
vy’ = £1 V 2 as follows. The complete proof is presented in the full version [38]. 


First, if b = 7, it holds that: 


inf [B® (vo:4,91V 62,7) =[B]° (Vo:r,P1V 92,7) 


te [0,b] 
=max([R]" (vorr91;7), LB] (vorrs¥2:7)) (by Def. 7 and global assump.) 
=[R]" (vo.5,1V¥2,7) (by Def. 4) 


Then, we make a local assumption that, given an arbitrary b, it holds that 
inf ¢e[0,b] [Z]° (vot, p1 V 2,7) = [R]° (vox, p91 V 2,7). We prove that, for b' 
which is the next sampling point to b, it holds that, 


inf [Z]° (vo: 
on a (vo:4,91V 2,7) 


=min (IRIY( Vo0:b;1V 2,7 7) [A1° (vo e1Ve2,7)) (by local assump.) 


R]” (vo:5,01,7 7),[R]" (vo.s2,7)), 
1° (vow 157); ae (by Defs. 4 & 7) 
) 


=min 


i: Vo:b/ 9157 ni (vo. 2,7 a 


min R]” (vo: by P1,T T), [2 (vo; b'P T 
=max 
RJ” 


min (vo: by P2,T T) |B ° (vo. b! P2,T 


max 
“ (vo; bs P1sT), [2]? (Vow 2,7 

"(Vo.0,1;7)s aa (Vo:b;925T 

(by global assump.) 


Y (vo: 1,7 [Z ara 


KA 

R 

R 

x([Z]° (vo: b P1, = “ (vo: b P2,T 
R 

([2]°( 


max(| 


P) (by def. of min, max) 


=max([R]" (Vo: 91,7), can (by global assump.) 
=[R]" (vo: 1V 2,7) (by Def. 4) 
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Thm. 1 
(v, T) Fe —— M(vo., 9,7) ge M (Nob, P, T) 


| | 


[R]" (vo:», 9,7) ¿thm 3 [Z]E (vo, 9, T) 
[R]' (vo., 9,7) [Z|° (vow, ~,T) 


Z WUT, 


R(v, P, T) {— 


Fig. 6. Refinement among STL monitors 


Theorem 3 shows that the result [R]”(vo.s, Y, 7) of ClaM can be derived from 
the result of QCauM by applying infy<(o,5) [Z]° (vo:», p, t). For instance, comparing 
the results of QCauM in Fig.5 and the results of ClaM in Fig. 2, we can find that 
the results in Fig. 2 can be reconstructed by using the results in Fig. 5. 


Remark 1. Figure6 shows the refinement relations between the six STL mon- 
itoring approaches. The left column lists the offline monitoring approaches 
derived directly from the Boolean and quantitative semantics of STL respec- 
tively. The middle column shows the classic online monitoring approaches. Our 
two causation monitors, namely BCauM and QCauM, are given in the column on 
the right. Given a pair (A, B) of the approaches, A — B indicates that the app- 
roach B refines the approach A, in the sense that B can deliver more information 
than A, and the information delivered by A can be derived from the informa- 
tion delivered by B. It is clear that the refinement relation in the figure ensures 
transitivity. Note that blue arrows are contributed by this paper. As shown by 
Fig. 6, the relation between BCauM and QCauM is analogous to that between the 
Boolean and quantitative semantics of STL. 


5 Experimental Evaluation 


We implemented a tool? for our two causation monitors. It is built on the top of 
Breach [15], a widely used tool for monitoring and testing of hybrid systems [18]. 
Being consistent with Breach, the monitors target the output signals given by 
Simulink models, as an additional block. Experiments were executed on a MacOS 
machine, 1.4GHz Quad-Core Intel Core-i5, 8 GB RAM, using Breach v1.10.0. 


5.1 Experiment Setting 


Benchmarks. We perform the experiments on the following two benchmarks. 
Abstract Fuel Control (AFC) is a powertrain control system from Toyota [27], 
which has been widely used as a benchmark in the hybrid system community [18- 
20]. The system outputs the air-to-fuel ratio AF, and requires that the deviation 
of AF from its reference value AFref should not be too large. Specifically, we 
consider the following properties from different perspectives: 


eS [10,50] (|AF — AFref| < 0.1): the deviation should always be small; 


3 Available at https://github.com/choshina/STL-causation-monitor, and Zenodo [39]. 
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pSFo := Ono.48.5)0(0,1.5] (\AF — AFref| < 0.08): a large deviation should not 
last for too long time; 
pF := On0,48](|AF—AFref| > 0.08 > ©j0,9)(|AF—AFref| < 0.08)): whenever 
the deviation is too large, it should recover to the normal status soon. 


Automatic transmission (AT) is a widely-used benchmark [18-20], implementing 
the transmission controller of an automotive system. It outputs the gear, speed 
and RPM of the vehicle, which are required to satisfy this safety requirement: 


pil := O27(speed > 50 > O1,,3)(RPM < 3000)): whenever the speed is 
higher than 50, the RPM should be below 3000 in three time units. 


Baseline and Experimental Design. In order to assess our two proposed 
monitors (the Boolean causation monitor BCauM in Definition 6, and the quan- 
titative causation monitor QCauM in Definition 7), we compare them with two 
baseline monitors: the classic quantitative robustness monitor ClaM (see Defi- 
nition 4); and the state-of-the-art approach monitor with reset ResM [40], that, 
once the signal violates the specification, resets at that point and forgets the 
previous partial signal. 

Given a model and a specification, we generate input signals by randomly 
sampling in the input space and feed them to the model. The online output 
signals are given as inputs to the monitors and the monitoring results are col- 
lected. We generate 10 input signals for each model and specification. To account 
for fluctuation of monitoring times in different repetitions*, for each signal, the 
experiment has been executed 10 times, and we report average results. 


5.2 Evaluation 


Qualitative Evaluation. We here show the type of information provided by 
the different monitors. As an example, Fig. 7 reports, for two specifications of 
the two models, the system output signal (in the top of the two sub-figures), and 
the monitoring results of the compared monitors. We notice that signals of both 
models (top plots) violate the corresponding specifications in multiple points. 
Let us consider monitoring results of por similar observations apply to yt". 
When using the ClaM, only the first violation right after time 15 is detected 
(the upper bound of robustness becomes negative); after that, the upper bound 
remains constant, without reporting that the system recovers from violation at 
around time 17, and that the specification is violated again four more times. 
Instead, we notice that the monitor with reset ResM is able to detect all 
the violations (as the upper bound becomes greater than 0 when the violation 
episode ends), but it does not properly report the margin of robustness; indeed, 
during the violation episodes, it reports a constant value of around —0.4 for the 
upper bound, but the system violates the specification with different degrees of 
severity in these intervals; in a similar way, when the specification is satisfied 
around after time 17, the upper bound is just above 0, but actually the system 


* Note that only the monitoring time changes across different repetitions; monitoring 
results are instead always the same, as monitoring is deterministic for a given signal. 
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Fig. 7. Examples of the information provided by the different monitors 


Table 1. Experimental results — Average (avg.) and standard deviation (stdv.) of 
monitoring and simulation times (ms) 


ClaM ResM BCauM QCauM 


monitor total monitor total monitor total monitor total 


avg. | stdv. | avg. stdv. | avg. | stdv. | avg. stdv. | avg. | stdv. | avg. stdv. | avg. | stdv. | avg. stdv. 
eFC |14.6|0.1 | 982.8/3.5 | 88) 2.4 | 981.3] 6.7 |36.9] 5.4 | 1009.7) 16.5 | 15.1/0.1 | 981.9/ 4.4 
pS} 26.8 0.2 | 998.5/9.0 | 20.2) 5.2 | 988.0] 9.9 | 50.4) 22.4 | 1023.9 25.1 |27.4|0.2 | 999.5| 8.2 
5C | 42.0)0.3 | 1016.5 '8.9 | 45.5) 4.8 |1016.9| 7.5 |48.4| 6.2 | 1021.2) 7.9 |81.0|1.2 | 1060.1) 5.3 
gi" }16.7)0.2 | 966.0/2.6 | 24.0/17.0 | 980.4] 24.2 | 96.1) 82.6 | 1065.2) 93.4 | 31.2/0.6 | 985.0) 7.5 


satisfies the specification with different margins. As a consequence, ResM provides 
sharp changes of the robustness upper bound that do not faithfully reflect the 
system evolution. 

We notice that the Boolean causation monitor BCauM only reports informa- 
tion about the violation episodes, but not on the degree of violation/satisfaction. 
Instead, the quantitative causation monitor QCauM is able to provide a very 
detailed information, not only reporting all the violation episodes, but also prop- 
erly characterizing the degree with which the specification is violated or satisfied. 
Indeed, in QCauM, the violation causation distance smoothly increases from vio- 
lation to satisfaction, so faithfully reflecting the system evolution. 


Quantitative Assessment of Monitoring Time. We discuss the computa- 
tion cost of doing the monitoring. 
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Table 2. Experimental results of the four monitoring approaches — Monitoring time 
(ms) — AA = (QauM—A)/4 


vio ClaM |ResM |BCauM | QCauM aout a) ATC ClaM |ResM BCauM | QCauM So A 
#1 [145 | 82 |374 |152 [4.8 85.4 | —59.4 #1 (26.8 |19.8 | 45.9 |27.4 | 2.2 38.4 | —40.3 
#2 (145 | 81 139.9 /150 |3.4 85.2 | -62.4 #2 [27.1 [27.3 | 27.6 |27.8 | 2.6 1.8 0.7 
#3 (148 | 8.0 |38.2 (15.0 |1.4 87.5 | —60.7 #3 = 26.6 | 26.2 | 30.0 |27.5 | 3.4 5.0 —8.3 
#4 (14.7 8.5 | 38.8 15.3 Al 80.0 —60.6 #4 |26.6 |14.2 | 107.2 | 27.0 1.5 90.1 —74.8 
#5 |14.6 | 8.0 |37.3 (149 [2.1 86.3. | —60.1 #5 (26.7 [15.8 | 50.9 /27.3 | 2.2 72.8 | —46.4 
#6 |14.6 | 8.2 |376 (15.1 [3.4 84.1 | -59.8 #6 |26.6 [15.8 | 56.4 | 27.2 |23 72.2 | 51.8 
#7 |14.6 |15.5 |21.6 |15.0 |2.7 -3.2 | —30.6 #7 | 26.8 | 25.4 | 33.5 |27.5 | 2.6 8.3 |-17.9 
#8 (14.7 | 7.9 |39.5 |15.0 | 2.0 89.9 | —62.0 #8 (26.9 |17.0 | 51.9 |27.4 |19 61.2 | —47.2 
#9 (14.6 | 7.8 |39.9 |15.1 |3.4 93.6 | —62.2 #9 (27.1 |25.1 | 50.9 |27.6 |18 10.0 | —45.8 
#10 |14.5 | 8.0 |38.4 (15.1 [41 88.8 | —60.7 #10 | 26.7 |15.8 | 50.1 |27.3 | 2.2 72.8 |—45.5 
ySF° | ClaM |ResM |BCauM | QCauM QCautt stat. (%) yi’ |ClaM |ResM | BCauM | QCauM Gcantt stat. (7) 
AClaM | AResM | ABCauM AClaM | AResM | ABCauM 

#1 |42.1 |49.2 |491 /81.2 |92.9 65.0 | 65.4 #1 |16.9 /30.7 | 29.6 /32.1 | 89.9 4.6 8.4 
#2 |42.5 |42.2 |42.2 (82.1 | 93.2 94.5 | 94.5 #2 |16.7 |17.4 | 17.4 /31.9 |91.0 83.3 | 83.3 
#3 |41.8 |48.8 |48.8 /81.5 | 95.0 67.0 | 67.0 #3 |16.7 |16.8 | 253.4 /31.0 | 85.6 84.5 | -87.8 
#4 |42.0 |34.9 |634 |78.8 |87.6 |125.8 | 24.3 #4 /16.9 |69.7 | 70.2 |31.8 |88.2 |—54.4 | —54.7 
#5 |41.7 |48.9 |48.7 |79.6 | 90.9 62.8 | 63.4 #5 |16.8 /19.6 [135.9 /31.0 | 84.5 58.2 | —77.2 
#6 /41.7 |48.5 |48.7 |797 | 91.1 64.3 | 63.7 #6 |16.5 / 26.5 | 200.5 |30.2 | 83.0 14.0 | —84.9 
#7 (42.3 |42.7 |42.5 |81.9 |93.6 91.8 | 92.7 #7 |16.6 |14.6 | 37.9 /31.0 [86.7 /112.3 | -18.2 
#8 (42.1 [42.2 |42.0 |81.6 |93.8 93.4 | 94.3 #8 |16.8 (16.4 |143.8 /314 | 86.9 91.5 | —78.2 
#9 (42.3 |49.1 |49.3 (82.6 |95.3 68.2 | 67.5 #9 |16.3 /13.9 | 38.6 /31.0 [90.2 /123.0 |-19.7 
#10 /41.6 |48.6 |49.1 /80.8 | 94.2 66.3 | 64.6 #10|16.5 |14.2 | 33.2 /309 [87.3 /117.6 | —6.9 


In Table 1, we observe that, for all the monitors, the monitoring time is much 
lower than the total time (system execution + monitoring). It shows that, for 
this type of systems, the monitoring overhead is negligible. Still, we compare the 
execution costs for the different monitors. Table 2 reports the monitoring times 
of all the monitors for each specification and each signal. Moreover, it reports 
the percentage difference between the quantitative causation monitor QCauM (the 
most informative one) and the other monitors. 

We first observe that ResM and BCauM have, for the same specification, 
high variance of the monitoring times across different signals. ClaM and QCauM, 
instead, provide very consistent monitoring times. This is confirmed by the stan- 
dard deviation results in Table1. The consistent monitoring cost of QCauM is a 
good property, as the designers of the monitor can precisely forecast how long 
the monitoring will take, and design the overall system accordingly. 

We observe that QCauM is negligibly slower than ClaM for yiF© and yf, and 
at most twice slower for the other two specifications. This additional monitoring 
cost is acceptable, given the additional information provided by QCauM. Com- 
pared to ResM, QCauM is usually slower (at most around the double); also in this 
case, as QCauM provides more information than ResM, the cost is acceptable. 

Compared to the Boolean causation monitor BCauM, QCauM is usually faster, 
as it does not have to collect epochs, which is a costly operation. However, we 
observe that it is slower in y$FC, because, in this specification, most of the signals 
do not violate it (and so also BCauM does not collect epochs in this case). 
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To conclude, QCauM is a monitor able to provide much more information that 
exiting monitors, with an acceptable overhead in terms of monitoring time. 


6 Related Work 


Monitoring of STL. Monitoring can be performed either offline or online. 
Offline monitoring [16,30,33] targets complete traces and returns either true or 
false. In contrast, online monitoring deals with the partial traces, and thus a 
three-valued semantics was introduced for LTL monitoring [7,8], and in further 
for MTL and STL qualitative online monitoring [24,31], to handle the situation 
where neither of the conclusiveness can be made. In usual, the quantitative online 
monitoring provides a quantitative value or a robust satisfaction interval [12— 
14,25, 26]. Based on them, several tools have been developed, e.g., AMT [32,33], 
Breach [15], S-Taliro [1], etc. We refer to the survey [3] for comprehensive intro- 
duction. Recently, in [35], Qin and Deshmukh propose clairvoyant monitoring to 
forecast future signal values and give probabilistic bounds on the specification 
validity. In [2], an online monitoring is proposed for perception systems with 
Spatio-temporal Perception Logic [23]. 


Monotonicity Issue. However, most of these works do not handle the mono- 
tonicity issue stated in this paper. In [10], Cimatti et al. propose an assumption- 
based monitoring framework for LTL. It takes the user expertise into account and 
allows the monitor resettable, in the sense that it can restart from any discrete 
time point. In [37], a recovery feature is introduced in their online monitor [25]. 
However, the technique is an application-specific approach, rather than a general 
framework. In [40], a reset mechanism is proposed for STL online monitor. How- 
ever, as experimentally evaluated in Sect. 5, it essentially provides a solution 
for the Boolean semantics and still holds monotonicity between two resetting 
points. 


Signal Diagnostics. Signal diagnostics [5,22,32] is originally used in an offline 
manner, for the purpose of fault localization and system debugging. In [22], the 
authors propose an approach to automatically address the single evaluations 
(namely, epochs) that account for the satisfaction/violation of an STL specifi- 
cation, for a complete trace. This information can be further used as a reference 
for detecting the root cause of the bugs in the CPS systems [5,6,32]. The online 
version of signal diagnostics, which is the basis of our Boolean causation moni- 
tor, is introduced in [40]. However, we show in Sect. 5 that the monitor based on 
this technique is often costly, and not able to deliver the quantitative runtime 
information compared to the quantitative causation monitor. 


7 Conclusion and Future Work 


In this paper, we propose a new way of doing STL monitoring based on causa- 
tion that is able to provide more information than classic monitoring based on 
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STL robustness. Concretely, we propose two causation monitors, namely BCauM 
and QCauM. In particular, BCauM intuitively explains the concept of “causation” 
monitoring, and thus paves the path to QCauM that is more practically valuable. 
We further prove the relation between the proposed causation monitors and the 
classic ones. 


As future work, we plan to improve the efficiency the monitoring, by avoiding 


some unnecessary computations for some instants. Moreover, we plan to apply 
it to the monitoring of real-world systems. 
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Abstract. We characterize all common notions of behavioral equiva- 
lence by one 6-dimensional energy game, where energies bound capabil- 
ities of an attacker trying to tell processes apart. The defender-winning 
initial credits exhaustively determine which preorders and equivalences 
from the (strong) linear-time—branching-time spectrum relate processes. 

The time complexity is exponential, which is optimal due to trace 
equivalence being covered. This complexity improves drastically on our 
previous approach for deciding groups of equivalences where exponential 
sets of distinguishing HML formulas are constructed on top of a super- 
exponential reachability game. In experiments using the VLTS bench- 
marks, the algorithm performs on par with the best similarity algorithm. 


Keywords: Bisimulation - Energy games - Process equivalence 
spectrum 


1 Introduction 


Many verification tasks can be understood along the lines of “how equivalent” two 
models are. Figure 1 replicates a standard example, known for instance from the 


Mx 
lea lcg 
eca ecB 
o o 


Fig. 1. A specification of mutual exclusion Mx, and Peterson’s protocol Pe. 
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textbook Reactive Systems [3]: A specification of mutual exclusion Mx as two 
alternating users A and B entering their critical section ec4/ecg and leaving 
lc4/lcg before the other may enter; and the transition system of Peterson’s [28] 
mutual exclusion algorithm Pe, minimized by weak bisimilarity, with internal 
steps — due to the coordination that needs to happen. For Pe to faithfully 
implement mutual exclusion, it should behave somewhat similarly to Mx. 

Semantics in concurrent models must take nondeterminism into account. Set- 
ting the degree to which nondeterminism counts induces equivalence notions with 
subtle differences: Pe and Mx weakly simulate each other, meaning that a tree 
of options from one process can be matched by a similar tree of the other. This 
implies that they have the same weak traces, that is, matching paths. However, 
they are not weakly bi-similar, which would require a higher degree of symmetry 
than mutual simulation, namely, matching absence of options. There are many 
more such notions. Van Glabbeek’s linear-time—branching-time spectrum [21] 
(cf. Fig.3) brings order to the hierarchy of equivalences. But it is notoriously 
difficult to navigate. In our example, one might wonder: Are there notions relat- 
ing the two besides mutual simulation? 

Our recent algorithm for linear-time-—branching-time spectroscopy by Bisp- 
ing, Nestmann, and Jansen [7,9] is capable of answering equivalence questions 
for finite-state systems by deciding the spectrum of behavioral equivalences in one 
go. In theory, that is. In practice, the algorithm of [7] runs out of memory when 
applied to the weak transition relation of even small examples like Pe. The rea- 
son for this is that saturating transition systems with the closure of weak steps 
adds a lot of nondeterminism. For instance, Pe may reach 10 different states 
by internal steps (—*). The spectroscopy algorithm of [7] builds a bisimulation 
game where the defender wins if the game starts at a pair of equivalent processes. 
To allow all attacks relevant for the spectrum, the [7|-game must consider parti- 
tionings of state sets reached through nondeterminism. There are 115,975 ways 
of partitioning 10 objects. As a consequence, the game graph of |7] comparing 
Pe and Mx has 266,973 game positions. On top of each postion, [7] builds sets 
of distinguishing formulas of Hennessy-Milner modal logic (HML) [21,24] with 
minimal expressiveness. These sets may grow exponentially. Game over! 


Contributions. In this paper, we adapt the spectroscopy approach of [7,9] to 
render small verification instances like Pe/Mx feasible. The key ingredients that 
will make the difference are: understanding the spectrum purely through depth- 
properties of HML formulas; using multidimensional energy games [15] instead of 
reachability games; and exploiting the considered spectrum to drastically reduce 
the branching-degree of the game as well as the height of the energy lattice. 
Figure 2 lays out the algorithm with pointers to key parts of this paper. 


— Subsection 2.2 explains how the linear-time-branching-time spectrum can 
be understood in terms of siz dimensions of HML expressiveness, and Sub- 
sect. 3.1 introduces a class of declining energy games fit for our task. 

— In Subsect. 3.2, we describe the novel spectroscopy energy game, and, in Sub- 
sect. 3.3, prove it to characterize all notions of equivalence definable by the 
siz dimensions. 
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Fig. 2. Overview of the computations — and correspondences ~ we will discuss. 


— Subsection 3.4 shows that a more clever game with only linear branching- 
factor still covers the spectrum. 

— Subsection 4.1 provides an algorithm to compute winning initial energy levels 
for declining energy games with ming }, which enables decision of the whole 
considered spectrum in 2°P|) for systems with [P| processes (Subsect. 4.2). 

— In Subsect. 4.3, we add fine print on how to obtain equivalences and distin- 
guishing formulas in the algorithm. 

— Section 5 compares to |7] and [29] through experiments with the widely used 
VLTS benchmark suite [18]. The experiments also reveal insights about the 
suite itself. 


2 Distinctions and Equivalences in Transition Systems 


Two classic concepts of system analysis form the background of this paper: 
Hennessy—Milner logic (HML) interpreted over transition systems goes back to 
Hennessy and Milner [24] investigating observational equivalence in operational 
semantics. Van Glabbeek’s linear-time-branching-time spectrum |21] arranges all 
common notions of equivalence as a hierarchy of HML sublanguages. 


2.1 Transition Systems and Hennessy—Milner Logic 


Definition 1 (Labeled transition system). A labeled transition system is 
a tuple S = (P, X, —) where P is the set of processes, X is the set of actions, 
and — CPx xP is the transition relation. 

By T(p) we denote the actions enabled initially for a process p E€ P, that 
is, I(p) = {a € X | Ip'.p & p'}. We lift the steps to sets with P > P' iff 
P' = {p' | 3p € P.p > p'}. 

Hennessy—Milner logic expresses observations that one may make on such a 
system. The set of formulas true of a process offers a denotation for its semantics. 


Definition 2 (Hennessy—Milner logic). The syntax of Hennessy—Milner 
logic over a set X of actions, HML[S], is defined by the grammar: 


e = laje with a € X 
| Niv} 
Y = | p. 
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bisimulation B 
(co, 00, 00, 00, OO, 00) 
| 
2-nested simulation 2S 
(co, OO, 00, 00,00, 1) 


/ 


ready simulation RS 
(co, 00,00, OO, l 1) 


x 


readiness traces RT possible futures PF 
(09, 00, 00, 1,1, 1) (00, 2, 00, 00, 00, 1) 


/ N Z 


simulation 1S failure traces FT readiness R 
(00, 00, 00, 00, 0, 0) (co, œ, 00, 0, 1, 1) (60,2; 1; 1,1;1) 


XZ 


revivals RV impossible futures IF 
(co, 2,1,0, 1,1) (69, 2,0, 0, OO, 1) 


NO 


failures F 
(co, 2, 0, 0, l; 1) 


a 


traces T 
(co, 1, 0,0, 0, 0) 
| 


enabledness E 
(1, 1,0, 0,0, 0) 


Fig. 3. Hierarchy of equivalences/preorders becoming finer towards the top. 


Its semantics ||- 1° over a transition system S = (P,X',—) is given as the set 
of processes where a formula “is true” by: 


Kayl? = {p € P | ap € [Ap > p} 
[Avil = (bed? |i 1A hoti = y} 
iel s o. 

\ Ute? | Hi € T. i = 79}. 


HML basically extends propositional logic with a modal observation operation. 
Conjunctions then bound trees of future behavior. Positive conjuncts mean lower 
bounds, negative ones impose upper bounds. For the scope of this paper, finite 
bounds suffice, i.e., conjunctions are finite-width. The empty conjunction T := 
/\@ is usually omitted in writing. 
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Fig. 4. Example system of internal decision > against an action —>. 


We use Hennessy—Milner logic to capture differences between program behav- 
iors. Depending on how much of its expressiveness we use, different notions of 
equivalence are characterized. 


Definition 3 (Distinguishing formulas and preordering languages). A 
formula p E€ HML[2] is said to distinguish two processes p,q € P iff p € lel? 
and q ¢ [y]°. A sublanguage of Hennessy-Milner logic, Ox GC HMLĪX], either 
distinguishes two processes, p Ax q, if it contains a distinguishing formula, or 
preorders them otherwise. If processes are preordered in both directions, p <x q 
and q <x p, then they are considered X-equivalent, p ~x q. 


Fig.3 charts the linear-time-branching-time spectrum. If processes are pre- 
ordered/equated by one notion of equivalence, they also are preordered /equated 
by every notion below. We will later formally characterize the notions through 
Proposition 1. For a thorough presentation, we point to [23]. For those familiar 
with the spectrum, the following example serves to refresh memories. 


Example 1. Fig.4 shows a tiny slice of the weak-step-saturated version of our 
initial example from Fig. 1 that is at the heart of why Pe and Mx are not bisimula- 
tion-equivalent. The difference between S and S’ is that S can internally transi- 
tion to Div (labeled >) without ever performing an ec, action. We can express 
this difference by the formula ys := (7) A{-(ec,)}, meaning “after T, ec4 may 
be impossible.” It is true for S, but not for S’. Knowing a distinguishing formula 
means that S and S’ cannot be bisimilar by the Hennessy—Milner theorem. The 
formula ys is called a failure (or refusal) as it specifies a set of actions that 
are disabled after a trace. In the other direction of comparison, the negation 
ps: = A{7(7) A{7(eca)}} distinguishes S’ from S. The differences between the 
two processes cannot be expressed in HML without negation. Therefore the pro- 
cesses are simulation-equivalent, or similar, as similarity is characterized by the 
positive fragment of HML. 


2.2 Price Spectra of Behavioral Equivalences 


For algorithms exploring the linear-time—branching-time spectrum, it is conve- 
nient to have a representation of the spectrum in terms of numbers or “prices” 
of formulas as in [7]. We, here, use six dimensions to characterize the notions 
of equivalence depicted in Fig. 3. The numbers define the HML observation lan- 
guages that characterize the very preorders/equivalences. Intuitively, the colorful 
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a= 3 6 (lca) A 
ea = 2 
(T) ^ 
e6 = 1 
(ecg) ————— ^ 


Fig. 5. Pricing e of formula (7) \{(eca) (Ica) T, (T)}T, ~(ecB)T}. 


numbers mean: (1) Formula modal depth of observations. (2) Formula nesting 
depth of conjunctions. (3) Maximal modal depth of deepest positive clauses in 
conjunctions. (4) Maximal modal depth of the other positive clauses in conjunc- 
tions. (5) Maximal modal depth of negative clauses in conjunctions. (6) Formula 
nesting depth of negations. More formally: 


Definition 4 (Energies). We denote as energies, En, the set of N-dimensional 
vectors (N)ï, and as extended energies, En, the set (N U {oo})%. 

We compare energies component-wise, i.e., (e1,... eN) < (fi,---, fN) if 
ei < fi for each i. Least upper bounds sup are defined as usual as component- 
wise supremum, as are greatest lower bounds inf. 


Definition 5 (Formula prices). The expressiveness price expr: HML[X] — 
(N) of a formula interpreted as 6-dimensional energies is defined recursively 
by: 


1+ expr, (p) expr, (p) 
expra(¥) expra() 
eoa =| SPIE) | o= | ate 
expr; (9) expr; (9) 
exprg(¥) 1 + expre (p) 


0 
1+ sup,e; €XPra (Y) 


expr( A i) at —— jutegrw) lie) 


iel SUD je Pos\ R EXP (wi 
SUP;e Neg EXPN (hi) 


Neg := {i € I | Ay). pi = 745} 
Pos := I \ Neg 
Ø if Pos = Ø 


R := 
{r} for some r € Pos where expr, (Yr) maximal for Pos. 
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bisimulation 


iw) possible 
future 


Fig. 6. Cut through the price lattice with dimensions 2 (conjunction nesting) and 
5 (negated observation depth). 


Figure 5 gives an example how the prices compound. The colors of the lines 
match those used for the dimensions and their updates in the other figures. 
Circles mark the points that are counted. The formula itself expresses a so- 
called ready-trace observation: We observe a trace T: eca - Ica and, along the 
way, may check what other options would have been enabled or disabled. Here, 
we check that 7 is enabled and ecg is disabled after the first 7-step. With the 
pricing, we can characterize all standard notions of equivalence: 


Proposition 1. On finite systems, the languages of formulas with prices below 
the coordinates given in Fig. 3 characterize the named notions of equivalence, 
that is, p xx q with respect to equivalence X, iff no p with expr(y) < ex 
distinguishes p from q. 


Example 2. The formulas of Example 1 have prices: expr((7)/\{->(eca)}) = 
(2,2,0,0,1,1) for ys and expr(/A\{7(7) A{7(eca) }}) = (2,3, 0,0, 2,2) for ys. The 
prices of the two are depicted as red marks in Fig. 6. This also visualizes how ys: is 
a counterexample for bisimilarity and how ys is a counterexample for failure and 
finer preorders. Indeed the two preorders are coarsest ways of telling the processes 
apart. So, S and S’ are equated by all preorders below the marks, i.e. similarity, 
S ~is S’, and coarser preorders (S ~r S’, S ~p S’). This carries over to the 
initial example of Peterson’s mutex protocol from Fig. 1, where weak simulation, 
Pe ~ıws Mx, is the most precise equivalence. Practically, this means that the 
specification Mx has liveness properties not upheld by the implementation Px. 


Remark 1. Definition 5 deviates from our previous formula pricing of |7] in a 
crucial way: We only collect the maximal depths of positive clauses, whereas [|7] 
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tracks numbers of deep and flat positive clauses (where a flat clause is charac- 
terized by an observation depth of 1). Our change to a purely “depth-guided” 
spectrum will allow us to characterize the spectrum by an energy game and to 
eliminate the Bell-numbered blow up from the game’s branching-degree. The 
special treatment of the deepest positive branch is necessary to address revival, 
failure trace, and ready trace semantics, which are popular in the CSP commu- 
nity [17,31]. 


3 An Energy Game of Distinguishing Capabilities 


Conventional equivalence problems ask whether a pair of processes is related by 
a specific equivalence. These problems can be abstracted into a more general 
“spectroscopy problem” to determine the set of equivalences from a spectrum 
that relate two processes as in [7]. This section captures the spectrum of Fig. 3 
by one rather simple energy game. 


3.1 Energy Games 


Multidimensional energy games are played on graphs labeled by vectors to be 
added to (or subtracted from) a vector of “energies” where one player must pay 
attention to the energies not being exhausted. We plan to encode the distinction 
capabilities of the semantic spectrum as energy levels in an energy game enriched 
by min, }-operations that takes minima of components. This way, energy levels 
where the defender has a winning strategy will correspond to equivalences that 
hold. We will just need updates decrementing or maintaining energy levels. 


Definition 6 (Energy updates). The set of energy updates, Up, contains 
vectors (u1,...,un) E Up where each component is of the form 


— uk € {-1, 0}, or 
— Up =minp where D C {1,...,N} andke D. 


Applying an update to an energy, upd(e,u), where e = (e1,...,en) € En (or 
En~) and u = (u1,...,un) € Up, yields a new energy vector e' where kth 
components ei, = ek + ux for up E€ Z and e, := mingep ea for uk = minp. 
Updates that would cause any component to become negative are illegal. 


Definition 7 (Games). An N-dimensional declining energy game G[go, eo] = 
(G,Ga,>9, W, 90,€0) is played on a directed graph uniquely labeled by energy 
updates consisting of 


— a set of game positions G, partitioned into 
e a set of defender positions Gg C G 
e a set of attacker positions Ga := G \ Ga, 
— a relation of game moves >> C G xG, 
a weight function for the moves w: (>>) > Up, 
— an initial position go € G, and 
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— an initial energy budget vector eg E€ Eng. 
The notation g + g' stands for g >> g' and w(g,g’) =u. 


Definition 8 (Plays, energies, and wins). We call the (finite or infinite) 
paths p = gog1 ... € G® with gi >> gi+1 plays of G[go, eo]. 

The energy level of a play p at round i, EL,(2), is recursively defined as 
EL,(0) := eo and otherwise as EL,(t+1) := upd(EL,(2), wi). If we omit the indez, 
EL,, this refers to the final energy level of a finite run p, i.e., ELp(|p| — 1). 

Plays where energy levels become undefined (negative) are won by the 
defender. So are infinite plays. If a finite play is stuck (i.e., go---Gn £), 
the stuck player loses: The defender wins if gn E Ga, and the attacker wins 
if On E€ Ga. 


Proposition 2. In this model, energy levels can only decline. 


1. Updates may only decrease energies, upd(e, u) < e. 

2. Energy level changes are monotonic: If ELpy < Elog and g >> g' then 
EL pga’ < ELogg'- 

3. If eo < eù and Glgo, eo] has non-negative play p, then G|go, eb] also has non- 
negative play p. 


Definition 9 (Strategies and winning budgets). An attacker strategy is a 
map from play prefixes ending in attacker positions to next game moves s: (G* x 
Ga) —> G with s(go..-ga) E (Ga > -). Similarly, a defender strategy names 
moves starting in defender states. If all plays consistent with a strategy s ensure 
a player to win, s is called a winning strategy for this player. The player with 
a winning strategy for Glgo, eo] is said to win G from position go with initial 
energy budget eo. We call Wina(g) = {e | Glg,e] is won by the attacker} the 
attacker winning budgets. 


Proposition 3. The attacker winning budgets at positions are upward-closed 
with respect to energy, that is, e € Wina(g) and e < e' implies e' € Win, (g). 


This means we can characterize the set of winning attacker budgets in terms 
of minimal winning budgets Win?""(g) = Min(Wina(g)), where Min( S) selects 
minimal elements {e € S | fe’ € S.e’ < e ^e £ e}. Clearly, Win” must be an 
antichain and thus finite due to the energies being well-partially-ordered [26]. 
Dually, we may consider the maximal energy levels winning for the defender, 


Win?**: G — 2E™%° where we need extended energies to bound won half-spaces. 


3.2 The Spectroscopy Energy Game 


Let us now look at the “spectroscopy energy game” at the center of our contribu- 
tion. Figure 7 gives a graphical representation. The intuition is that the attacker 
shows how to construct formulas that distinguish a process p from every q in a 
set of processes Q. The energies limit the expressiveness of the formulas. The first 
dimension bounds for how many turns the attacker may challenge observations 
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Fig. 7. Schematic spectroscopy game Ga of Definition 10. 


of actions. The second dimension limits how often they may use conjunctions 
to resolve nondeterminism. The third, fourth, and fifth dimensions limit how 
deeply observations may nest underneath a conjunction, where the fifth stands 
for negated clauses, the third for one of the deepest positive clauses and the 
fourth for the other positive clauses. The last dimension limits how often the 
attacker may use negations to enforce symmetry by swapping sides. The moves 
closely match productions in the grammar of Definition 2 and prices in Defini- 
tion 5. 


Definition 10. (Spectroscopy energy game). For a system S = (P, X,—>), 
the 6-dimensional spectroscopy energy game G$ [go, e0] = (G, Ga, >>, w, go, €o) 
consists of 


— attacker positions (p, Q), € Ga, 
- attacker clause positions (p,q), € Ga, 
— defender conjunction positions (p, Q, Qs), € Ga, 


where p,q E€ P and Q,Q, € 2”, and six kinds of moves: 


— observation moves (p, Q), Kne R fp SQ, 
— conj. challenges (p, Q), e (P,O \ Q Q) fQ CQ, 
— conj. revivals (P,Q, Qx)a ar (p,Q.), f Q. $Ø, 

— conj. answers (P,Q, Qs), ou (p,q); fqEQ, 

— positive decisions (p,q), a (p, {qa and 

— negative decisions (p, q) eS (o DH. if p#a. 


The spectroscopy energy game is a bisimulation game in the tradition of Stir- 
ling [33]. 
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Fig. 8. Example 3 spectroscopy energy game, minimal attacker winning budgets, and 
distinguishing formulas/clauses. (In order to reduce visual load, only the first compo- 
nents of the updates are written next to the edges. The other components are 0.) 


Lemma 1. (Bisimulation game, proof see [5]). po and qo are bisimilar 
iff the defender wins Ga|(po, {qo})a, €o] for every initial energy budget eo, i.e. if 
(00, OO, 00, 00, OO, 00) € Wing**((po, {qo }).)- 


In other words, if there are initial budgets winning for the attacker, then the 
compared processes can be told apart. For Ga, the attacker “unknown initial 
credit problem” in energy games [34] coincides with the “apartness problem” [20] 
for processes. 


Example 3. Figure 8 shows the spectroscopy energy game starting at (S, {S’})_ 
from Example 1. The lower part of each node displays the node’s Win"". The 
magenta HML formulas illustrate distinctions relevant for the correctness argu- 
ment of the following Subsect. 3.3. Section 4 will describe how to obtain attacker 
winning budgets and equivalences. The blue “symmetric” positions are definitely 
won by the defender—we omit the game graph below them. We also omit the 
move (S', {S, Div}), X2—=*""", (S’, {S}, {Div}),—it can be ignored as will be 
discussed in Subsect. 3.4. 
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3.3 Correctness: Tight Distinctions 


We will check that winning budgets indeed characterize what equivalences hold 
by constructing price-minimal distinguishing formulas from attacker budgets. 


Definition 11 (Strategy formulas). Given the set of winning budgets Wina, 
the set of attacker strategy formulas Strat for a position with given energy level 
e is defined inductively as follows: 


(b)ip € Strat(( (Pp, ,Q).,€) if (p,Q), > (P,Q) e = upd(e, u) € Wina((p’, Q"),), 
psp, Q > Q, end o Sia Q').,€ e’), 

Y € Strat(( P, Q) a? e) if (p,Q)., (p, Q, Qa) d? e = upd(e, u) E Wina((p, Q, Q«) 4); 
and p € Strat((p, Q, Qx),.€ 2), 

Nacota E€ Strat((p, Q, Ø)a, €) if (p,Q,2@), > (p,q), eq = upd(e,ug) € 
Wina ((p,q)-) and pq € Strat((p,q).,€q) for each q Q, 

NMNaegug} Va € Strat(( P,Q, Qx) Jas e) if (p,Q, Qx)a => P,q4)a; €g = upd(e, uq) € 
Wina((p,q).) and pq € Strat((p,q)., eq) for aol qE 2 and if (p, Q, Qs.) >> 
(p, Qs)a; €x = upd(e, ux) E Wina((p,Q.).), and Ypy E Strat((p,Q.),,€%) is an 


observation, 

p E Strat((p,q).,e) if (p,a). > (p, {4} e = upd(e,u) E€ Wina((p, {4})a) 
and p € Strat (( p, {0})a> sj is an observation, and 

ay € Strat((p,q)3,€) if (p,q); “> (q, {P})a © = upd(e,u) € Wina((q, {p}).) 


and y E Strat((q, {p})., ns is an observation. 


Because of the game structure, we actually know the u needed in each line 
of the definition. It is u = (—1,0,0,0,0,0) in the first case; (0,—1, 0,0,0,0) 
in the second; (0,0,0, mings, 4,0,0) in the third; (0,0,0,ming 9,0,0) and 
(ming,3,0,0,0,0,0) in the fourth; (ming 4,0,0,0,0,0) in the fifth; and 
(ming s, 0,0,0,0, —1) in last case. Strat((p,q).,-) can contain negative clauses, 
which form no proper formulas on their own. 


Lemma 2 (Price soundness). ¢ E Strat((p,Q).,e) implies that expr(y) < e 
and that expr(y) E Wina((p, Q).). 


Proof. By induction on the structure of p with arbitrary p, Q,e, exploiting the 
alignment of the definitions of winning budgets and formula prices. Full proof 
in [5]. 

Lemma 3 (Price completeness). e9 E€ Wina((po, Qo)a) implies there are ele- 


ments in Strat((po, Qo)a, €o). 


Proof. By induction on the tree of winning plays consistent with some attacker 
winning strategy implied by e9 E€ Wina((po, Qo),). Full proof in [5]. 


Lemma 4 (Distinction soundness). Every yp €E Strat((p,Q).,e) distin- 
guishes p from every q € Q. 


Proof. By induction on the structure of y with arbitrary p, Q, e, exploiting that 
Strat can only construct formulas with the invariant that they are true for p and 
false for each q € Q. Full proof in [5]. 
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Lemma 5 (Distinction completeness). If p distinguishes p from every q € 
Q, then expr(y) € Wina((p, Q).). 


Proof. By induction on the structure of p with arbitrary p, Q, exploiting the 
alignment of game structure and HML semantics and the fact that expr cannot 
“overtake” inverse updates. Full proof in [5]. 


Theorem 1 (Correctness). For any equivalence X with coordinate ex, p <x 
q, precisely if all ep, E Winz'"((p, {q}),) are above or incomparable, eng £ ex. 


Proof. By contraposition, in both directions. 


— Assume p £x q. This means some vy with expr(y) < ex distinguishes p from q. 
By Lemma 5, expr(y) € Wina ((p, {q}).). Then either expr(y) or a lower price 
epq < expr(y) are minimal winning budgets, i.e., epg € Win?" ((p, {q}),) and 
Cpq < €x. 

— Assume there is epg E Winz'"((p, {q}).) with epg < ex. By Lemma 3, there 
is y € Strat((p, {q})a, €pq). Due to Lemma 4, p must be distinguishing for p 
and q, and due to Lemma 2, expr(p) < epq < ex. 


The theorem basically means that by fixing an initial budget in Ga, we can 
obtain a characteristic game for any notion from the spectrum. 


3.4 Becoming More Clever by Looking One Step Ahead 


The spectroscopy energy game Ga of Definition 10 may branch exponentially 
with respect to |Q] at conjunction challenges after (p, Q),. For the spectrum we 
are interested in, we can drastically limit the sensible attacker moves to four 
options by a little lookahead into the enabled actions Z (q) of q € Q and Z(p). 


Definition 12 (Clever spectroscopy game). The clever spectroscopy game, 
Ga, is defined exactly like the previous spectroscopy energy game of Definition 10 
with the restriction of the conjunction challenges 


(p,Q)., eee (p,Q \ Qu, Qe) with Qu C Q, 


to situations where Q. € {1Ø, {q € Q | Z(qg) © T(p)}, {4 € Q | Zp) € Tig}, 
{4 € Q | T(p) =Z(q)$}- 


Theorem 2 (Correctness of cleverness). Assume modal depth of positive 
clauses e4 € {0,1, 00}, e4 < e3, and that modal depth of negative clauses es > 1 
implies e3 = e4. Then, the attacker wins Ga[(po, Qo), e] precisely if they win 
Ga [(po, Qo)., €l- 


Proof. The implication from the clever spectroscopy game G, to the full spec- 
troscopy game Ga, is trivial as the attacker moves in >>4 are a subset of those in 


>a and the defender has the same moves in both games. For the other direc- 

tion, we have to show that any move (p, Q), pe OOO (D, Q\ Qus Qu ), win- 
i 3 Eo z (0,—1,0,0,0,0) 

ning at energy level e can be simulated by a winning move (p, Q), »>~——»4 


(p, Q \ Q’, Q’),. Full proof in [5]. 
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4 Computing Equivalences 


The previous section has shown that attacker winning budgets in the spec- 
troscopy energy game characterize distinguishable processes and, dually, that 
the defender’s wins characterize equivalences. We now examine how to actually 
compute the winning budgets of both players. 


4.1 Computation of Attacker Winning Budgets 
The winning budgets of the attacker (Definition 9) are characterized inductively: 


— Where the defender is stuck, g E€ Gg and g >, the attacker wins with any 
budget, (0,0,0,0,0,0) € Wint""(g). 

— Where the defender has moves, g € Gg and g >> g; (for some indexing i € I 
over all possible moves), the attacker wins if they have a budget equal or 
above to all budgets that might be necessary after the defender’s move: If 
upd(e, ui) € Wina(g;) for all i € I, then e € Wina(g). 

— Where the attacker moves, g € Ga and g > g’, upd(e, u) € Wina(g’) implies 
e € Win, (4g). 


By Proposition 3, it suffices to find the finite set of minimal winning budgets, 
Win". Turning this into a computation is not as straightforward as in other 
energy game models. Due to the minp-updates, the energy update function 
upd(-, u) is neither injective nor surjective. 

We must choose an inversion function upd~! that picks minimal solutions 
and that minimally “casts up” inputs that are outside the image of upd(-, u), i.e., 
such that upd” ‘(e’, u) = inf{e | e' < upd(e,u)}. We compute it as follows: 


Definition 13 (Inverse update). The inverse update function is defined as 
upd ‘(e’,u) == sup({e} U {m(i) | JD. u; = minp}) with e; = ef — u; for all i 
where u; E€ {0,—1} and e; = e; otherwise, and with (m(i)); = e; for ui = minp 
and j € D, and (m(i)); =0 otherwise, for all i,j. 


Example 4. Let u := (ming 3, ming}, —1,—1). (3,4,0,1) ¢ img(upd(-, u)), but: 


upd *((3,4,0, 1), u) = sup{(3, 4, 1, 2), (3, 0,3, 0), (4,4,0,0)} = (4,4, 3, 2) 
upd((4, 4, 3, 2), u) = (3,4,2, 1) > (3, 4,0, 1) 
upd *((3, 4, 2,1), u) = sup{(3, 4, 3, 2), (3, 0, 3, 0), (4,4, 0, 0)} = (4,4, 3, 2) 


With upd~‘, we only need to find the Win2"" relation as a least fixed point of 
the inductive description. This is done by Algorithm 1. Every time a new way 
of winning a position for the attacker is discovered, this position is added to the 
todo. Initially, these are the positions where the defender is stuck. The update 
at an attacker position in Line 8 takes the inversely updated budgets (upd +) 
of successor positions to be tentative attacker winning budgets. At a defender 
position, the attacker only wins if they have winning budgets for all follow-up 
positions (Line 12). Any supremum of such budgets covering all follow-ups will 
be winning for the attacker (Line 13). At both updates, we only select the minima 
as a finite representation of the infinitely many attacker budgets. 
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1 def compute_winning budgets(G = (G, Ga, >>, w)): 
2 attacker_win := [g {} | g€ G] 
3 todo := {g € Ga | gy} 
4 while todo 4 Ø: 
5 g := some todo 
6 todo := todo \ {g} 
7 if g € Ga : 
8 new_attacker_win := Min(attacker_win[g] U {upd~‘(e’, u) | 
g> g' Ae’ € attacker_win[g’]}) 
9 else: 
10 defender _ post := {g' | g >> g'} 
11 options := {(g', upd} (e', u)) | gg’ A e' € attacker _win[g']}} 
12 if defender_ post C dom(options) : 
13 new _attacker_win := Min({sup,y/cefender post Strat(g’) | 
strat € (G — En) A Yg’. strat(g’) € options(g')}) 
14 else: 
15 | new_attacker_win := Ø 
16 if new_attacker_win Æ attacker _win[g] : 
17 attacker _win[g] := new_attacker_ win 
18 todo := todo U {gp | du. gp > g} 
19 Winn” := attacker win 
20 return Win” 


Algorithm 1: Algorithm for computing attacker winning budgets of declin- 
ing energy game G. 


4.2 Complexity and How to Flatten It 


For finite games, Algorithm 1 is sure to terminate in exponential time of game 
graph branching degree and dimensionality. 


Lemma 6 (Winning budget complexity, proof see [5]). For an N-dim- 
ensional declining energy game with > of branching degree o, Algorithm 1 ter- 
minates in O(|>>|- |G|X - (o + |G|“X-)°)) time, using O(|G|") space for the 
output. 


Lemma 7 (Full spectroscopy complexity). Time complexity of computing 
winning budgets for the full spectroscopy energy game Ga is in QO(IP | 217!) 
Proof. Out-degrees o in Ga can be bounded in O(2!”!), the whole game graph 
bal € O(||-2!P?!+|P|?-3!P!), and game positions |Ga| € O(|P|-3!”!). Insert 
with N = 6 in Lemma 6. Full proof in [5]. 


We thus have established the approach to be double-exponential. The complexity 
of the previous spectroscopy algorithm [7] has not been calculated. One must 
presume it to be equal or higher as the game graph has Bell-numbered branching 
degree and as the algorithm computes formulas, which entails more options than 
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the direct computation of energies. This is what lies behind the introduction’s 
observation that moderate nondeterminism already renders [7] unusable. 

Our present energy game reformulation allows us to use two ingredients to 
do way better than double-exponentially when focussing on the common linear- 
time—branching-time spectrum: 

First, Subsect. 3.4 has established that most of the partitionings in attacker 
conjunction moves can be disregarded by looking at the initial actions of pro- 
cesses. 

Second, Fahrenberg et al. [15] have shown that considering just “capped” 
energies in a grid Eng = {0,..., K} can reduce complexity. Such a flattening 
of the lattice turns the space of possible energies into constant factor (K + 1)% 
(with (K +1)%~1!-sized antichains) independent of input size. For Algorithm 1, 
space complexity needed for attacker_win drops to O(|G]|) and time complexity 
to |»>>|-20(°). If we are only interested in finitely many notions of equivalence as 
in the case of Fig. 3, we can always bound the energies to range to the maximal 
appearing number plus one. The last number represents all numbers outside the 
bound up to infinity. 


Lemma 8 (Clever spectroscopy complexity). Time complexity of com- 
puting winning budgets for the clever spectroscopy energy game Ga with capped 
energies is in gurl, 


Proof. Out-degrees o in Gy can be bounded in O(|P]), the whole game graph 
a] € O>] - 21P1 + [P]? . 2IPl), and game positions |G,| € O(P] - 2!?!). 
Inserting in the flattened version of Lemma 6 yields: 

Ol a] 202) = O((|]-21P + [PPP 21P1) . 201P) 
Ce Po) 


(>: 2%), 


II 
S G 


Deciding trace equivalence in nondeterministic systems is PSPACE-hard and 
will thus take at least exponential time. Therefore, the exponential time of the 
“clever” spectroscopy algorithm restricted to a finite spectrum is about as good 
as it may get, asymptotically speaking. 


4.3 Equivalences and Distinguishing Formulas from Budgets 


For completeness, let us briefly flesh out how to actually obtain equivalence 
information from the minimal attacker winning budgets Winz""((p, {q})..) we 
compute. 


Definition 14. For an antichain Mn C En characterizing an upper part of 
the energy space, the complement antichain Mn := Min (En  ({(sup E’) — 
(1,...,1) | BE’ C Mn}U {e(i) € Eno | (e(2)); = (inf Mn); — 1 A Yj Æ i. (e(4)); = 


oo})) has the complement energy space as its downset. 
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Wing" ((p, {4})a) = Win?" ((p, {q})a) characterizes all preordering formula lan- 
guages and thus equivalences defined in terms of expressiveness prices for p and 
q. This might contain multiple, incomparable, notions from the spectrum. Tak- 
ing both directions, Win?" ((p, {q})..) U Wing ((q, {p})..), will thus characterize 
the finest intersection of equivalences to equate p and q. 

If we just wonder which of the equivalences from the spectrum hold, we may 
establish this more directly by checking which of them are not dominated by 
attacker wins. 

From the information, we can also easily build witness relations to certify 
that we return sound equivalence results. In particular, the pairs won with arbi- 
trary attacker budgets, {(p,q) | (00,00, 00,00, 00,00) € Wing**((p, {q}),,)} are 
a bisimulation. Similarly, the strategy formulas of Definition 9 can directly be 
computed to explain inequivalence. 

If we use symbolic winning budgets capped as proposed at the end of Sub- 
sect. 4.2, the formula reconstruction will be harder and the Win3""((p, {g}).) 
might be below the maximal defender winning budgets if these exceed the bound. 
But this will not matter as long as we choose a cap beyond the natural numbers 
that characterize our spectrum. 


5 Exploring Minimizations 


Our algorithm can be used to analyze the equivalence structure of moderately- 
sized real-world transition systems. In this section, we take a brief look at its 
performance on the VLTS (“very large transition systems”) benchmark suite [18] 
and return to our initial Peterson example. 

The energy spectroscopy algorithm has been added to the Linear-Time— 
Branching-Time Spectroscope of [7] and can be tried on transition systems at 
https: //equiv.io/. 

Table1 reports the results of running the implementation of [7] and this 
paper’s implementation in variants using the spectroscopy energy game Ga and 
the clever spectroscopy energy game G,. We tested on the VLTS examples of 
up to 25,000 states and the Peterson example (Fig. 1). The table lists the P- 
sizes of the input transition systems and of their bisimilarity quotient system 
Pj~_- The spectroscopies have been performed on the bisimilarity quotient sys- 
tems by constructing the game graph underneath positions comparing all pairs 
of enabledness-equivalent states. The middle three groups of columns list the 
resource usage for the three implementations using: the [7|-spectroscopy, the 
energy game Ga, and the clever game Ga. For each group, the first column 
reports traversed game size, and the second gives the time the spectroscopy 
took in seconds. Where the tests ran out of memory or took longer than five 
minutes (in the Java Virtual Machine with 8 GB heap space, at 4 GHz, single- 
threaded), the cells are left blank. The last three columns list the output sizes of 
state spaces reduced with respect to enabledness ~p, traces ~r, and simulation 
~1g—as one would hope, all three algorithms returned the same results. 
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From the output, we learn that the VLTS examples, in a way, lack diversity: 
Bisimilarity ~g and trace equivalence ~r mostly coincide on the systems (third 
and penultimate column). 

Concerning the algorithm itself, the experiments reveal that the computation 
time grows mostly linearly with the size of the game move graph. Our algorithm 
can deal with bigger examples than [7] (which fails at peterson, vasy_10_56 
and cwi_1_2, and takes more than 500s for vasy_8_24). Even where [|7] has 
a smaller game graph (e.g. cwi_3_14), the exponential formula construction 
renders it slower. Also, the clever game graph >>, indeed is much smaller than 
>a for examples with a lot of nondeterminism such as peterson. 


Table 1. Sample systems, sizes, and benchmark results. 


system P| Pixs) B> t/s Mal t/s ma t/s Pian) Phog | Pias 
peterson 19 19 348,474 | 23.31 2,363 0.15 3 11 iil 
vasy_0_1 289 9 1,118 0.17 1,334) 0.02 566 0.02 1 9 9 
vasy_1_4 1,183 28| 1,125; 0.05} 1,320) 0.02 1,000, 0.02 8 28 28 
vasy_5_9 5,486 145| 3,789) 0.14} 4,315} 0.05 2,988, 0.06 109 145 145 


vasy_8_24 | 8,879 416 | 513,690 | 540.96 | 725,113 10.48 145,965| 2.15 ilgil 415 415 
vasy_8_38 | 8,921 219| 19,595}; 0.78} 19,690) 0.21 14,958) 0.19 112 218 218 


vasy_10_56 10,849} 2,112 6,012,676 | 174.59 ie}| Paley) ila) 
vasy_18_73 | 18,746 | 4,087 

vasy_25_25 | 25,217 | 25,217 | 100,866} 1.15 0| 0.32 0) 0133) 25,217 | 25,217 | 25,217 
cwi_1_2 1,952} 1,132 22,723,369 | 384.13 9| 1,132] 1,132 
cwi_3_14 3,996 62| 14,761, 2.48| 25,666) 0.28 18,350 0.3 2 62 62 


Of those terminating, the heavily nondeterministic cwi_1_2 is the most 
expensive example. As many coarse notions must record the nondeterministic 
options, this blowup is to be expected. If we compare to the best similarity algo- 
rithm by Ranzato and Tapparo [29], they report their algorithm SA to tackle 
cwi_1_2 single-handedly. Like our implementation, the prototype of SA [29] ran 
out of memory while determining similarity for vasy_18_73. This is in spite 
of SA theoretically having optimal complexity and similarity being less com- 
plex (cubic) than trace equivalence, which we need to cover. The benchmarks 
in [29] failed at vasy_10_56, and vasy_25_25, which might be due to 2010’s 
tighter memory requirements (they used 2 GB of RAM) or the degree to which 
bisimilarity and enabledness in the models is exploited. 


6 Conclusion and Related Work 


This paper has connected two strands of research in the field of system analysis: 
The strand of equivalence games on transition systems starting with Stirling’s 
bisimulation game [7,12,32,33] and the strand of energy games for systems of 
bounded resources [2,10,11, 14-16, 27, 30, 34]. 
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The connection rests on the insight that levels of equivalence correspond 
to resources available to an attacker who tries to tell two systems apart. This 
parallel is present in recent work within the security domain [25] just as much as 
in the first thoughts on observable nondeterminism by Hennessy and Milner [24]. 

The paper has not examined the precise relationship of the games of Sect. 3 
to the whole zoo of VASS, energy, mean-payoff, monotonic [1], and counter 
games. The spectroscopy energy game deviates slightly from common multi- 
energy games due to minp-updates and due to the attacker being energy-bound 
(instead of the defender). As the energies cannot be exhausted by defender 
moves, the game can also be interpreted as a VASS game [2,10] where the 
attacker is stuck if they run out of energy. Our algorithm complexity matches 
that of general lower-bounded N-dimensional energy games [15]. Links between 
our declining energy games and other games from the literature might enable 
slight improvements of the algorithm. For instance, reachability in VASS games 
can turn polynomial [11]. 

In the strand of generalized game characterizations for equivalences [7, 12, 32], 
this paper extends applicability for real-world systems. The implementation per- 
forms on par with the most efficient similarity algorithm [29]. Given that among 
the hundreds of equivalence algorithms and tools most primarily address bisimi- 
larity [19], a tool for coarser equivalences is a worthwhile addition. Although our 
previous algorithm [7] is able to solve the spectroscopy problem, its reliance on 
super-exponential partitions of the state space makes it ill-fit for transition sys- 
tems with significant nondeterminism. In comparison, our new algorithm also 
needs one less layer of complexity because it determines equivalences without 
constructing distinguishing formulas. 

These advances enable a spectroscopy of systems saturated by weak transi- 
tions. We can thus analyze weak equivalences such as in the example of Peter- 
son’s mutex. For special weak equivalences without a strong counterpart such as 
branching bisimilarity [22], deeper changes to the modal logic are necessary [6]. 

The increased applicability has allowed us to exhaustively consider equiva- 
lences on the smaller systems of the widely-used VLTS suite [18]. The exper- 
iments reveal that the spectrum between trace equivalence and bisimilarity 
mostly collapses for the examined systems. It may often be reasonable to spec- 
ify systems in such a way that the spectrum collapses. In a benchmark suite, 
however, a lack of semantic diversity can be problematic: For instance, other- 
wise sensible techniques like polynomial-time reductions [13] will not speed up 
language inclusion testing, and nuances of the weak equivalence spectrum [8] 
will falsely seem insignificant. One may also overlook errors and performance 
degradations that appear only for transition systems where equal traces do not 
imply equivalent branching behavior. We hope this blind spot does not affect 
the validity of any of the numerous studies relying on VLTS benchmarks. 


Acknowledgments. This work benefited from discussion with Sebastian Wolf, with 
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group at TU Berlin, as well as from reviewer comments. 
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Data Availibility. Proofs and updates are to be found in the report version of this 
paper [5]. The Scala source is on GitHub: https://github.com/benkeks/equivalence- 
fiddle/. A webtool implementing the algorithm runs on https://equiv.io/. An artifact 
including the benchmarks is archived on Zenodo [4]. 
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Abstract. This paper explores how using commutativity can improve 
the efficiency and efficacy of algorithmic termination checking for concur- 
rent programs. If a program run is terminating, one can conclude that 
all other runs equivalent to it up-to-commutativity are also terminat- 
ing. Since reasoning about termination involves reasoning about infinite 
behaviours of the program, the equivalence class for a program run may 
include infinite words with lengths strictly larger than w that capture 
the intuitive notion that some actions may soundly be postponed indefi- 
nitely. We propose a sound proof rule which exploits these as well as clas- 
sic bounded commutativity in reasoning about termination, and devise 
a way of algorithmically implementing this sound proof rule. We present 
experimental results that demonstrate the effectiveness of this method 
in improving automated termination checking for concurrent programs. 


1 Introduction 


Checking termination of concurrent programs is an important practical problem 
and has received a lot of attention [3,29,35,37]. A variety of interesting tech- 
niques, including thread-modular reasoning [10,34,35,37], causality-based rea- 
soning [29], and well-founded proof spaces [15], among others, have been used 
to advance the state of the art in reasoning about concurrent program termi- 
nation. Independently, it has been established that leveraging commutativity in 
proving safety properties can be a powerful tool in improving automated check- 
ers [16-19]. There are many instances of applications of Lipton’s reductions [32] 
in program reasoning [14,28]. Commutativity has been used to simultaneously 
search for a program with a simple proof and its safety proof [18,19] and to 
improve the efficiency and efficacy of assertion checking for concurrent programs 
[16]. Recently [17], abstract commutativity relations are formalized and combined 
to increase the power of commutativity in algorithmic verification. 

This paper investigates how using commutativity can improve the efficiency 
and efficacy of proving the termination of concurrent programs as an enhance- 
ment to existing techniques. The core idea is simple: if we know that a program 
run pabp’ is terminating, and we know that a and b commute, then we can con- 
clude that pbap’ is also terminating. Let us use an example to make this idea 
concrete for termination proofs of concurrent programs. Consider the two thread 
© The Author(s) 2023 
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Producer Thread: Consumer Thread: 


: while (i < producer limit) {|| 
C++; // produce content :iwhile (j < consumer_limit) { 
i++ E 

i c--; // consume content 


o} 
: barrier++; 


Fig. 1. Producer/Consumer Template 


templates in Fig. 1: one for a producer thread and one for a consumer thread, 
where i and j are local variables. The assumption is that barrier and the local 
counters i and j are initialized to 0. The producer generates content (modelled 
by incrementing of a global counter C++) up to a limit and then, using barrier, 
signals the consumer to start consuming. Independent of the number of produc- 
ers and consumers, this synchronization mechanism ensures that the consumers 
wait for all producers to finish before they start consuming. Note that the pro- 
ducer threads fully commute—each statement in a producer commutes with each 
statement in another. A producer and consumer only partially commute. 

In a program with only two producers, a human would argue at the high level 
that the independence of producer loops implies that their parallel composition 
is equivalent, up to commutativity, to their sequential composition. Therefore, it 
suffices to prove that the sequential program terminates. In other words, it should 
suffice to prove that each producer terminates. Let us see how this high level 
argument can be formalized using commutativity reasoning. Let A; and Àz stand 
for the loop bodies of the two producers. Among others, consider the (syntactic) 
concurrent program run (\,A2)”; this run may or may not be feasible. Since A; 
and Ag commute, we can transform this run, by making infinitely many swaps, 
to the run AYA. The model checking expert would consider this transformation 
rather misguided: it appears that we are indefinitely postponing A2 in favour of 
1. Moreover, a word with a length strictly larger than w, called a transfinite 
word, does not have an appropriate representation in language theory because 
it does not belong to ©”. Yet, the observation that (A,A2)” = AYA¥ is the 
key to a powerful proof rule for termination of concurrent programs: If AY is 
terminating and \; commutes against A2, then we can conclude that (A,A2)” is 
terminating. In other words, the termination proof for the first producer loop 
implies that all interleaved executions of two producers terminate, without the 
need for a new proof. Note that the converse is not true; termination of AYAY 
does not necessarily imply the termination of A9. So, even if we were to replace 
the second producer with a forever loop, our observation would stand as is. 
Hence, for the termination of the entire program (and not just the run (\;A2)”), 
one needs to argue about the termination of both AY and A¥, matching the 
high level argument. In Sect.3, we formally state and prove this proof rule, 
called the omega-prefix proof rule, and show how it can be incorporated into an 
algorithmic verification framework. Using this proof rule, the program consisting 
of N producers can be proved terminating by proving precisely N single-thread 
loops terminating. 

Now, consider adding a consumer thread to our two producer threads. The 
consumer loop is independent of the producer threads but the consumer thread, 
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as a whole, is not. In fact, part of the work of a termination prover is to prove that 
any interleaved execution of a consumer loop with either producer is infeasible 
due to the barrier synchronization and therefore terminating. Again, a human 
would argue that two such cases need to be considered: the consumer crosses 
the barrier with 0 or 1 producers having terminated. Each case involves several 
interleavings, but one should not have to prove them correct individually. Ideally, 
we want a mechanism that can take advantage of commutativity for both cases. 

Before we explore this further, let us recall an algorithmic verification tem- 
plate which has proven useful in incorporating commutativity into safety rea- 
soning [16-19] and in proving termination of sequential [25] and parameterized 
concurrent programs [15]. The work flow is illustrated in Fig. 2. The program and 
the proof are represented using (Biichi) automata, and module (d) (and conse- 
quently module (a)) are implemented as inclusion checks between the languages 
of these automata. The iteratively refined proof—a language of infeasible syn- 
tactic program runs—can be annotated Floyd-Hoare style and generalized using 
interpolation as in [25]. For module (b), any known technique for reasoning about 
the termination of simple sequential programs can be used on lassos. 

The straightforward way to account for commutativity in this refinement 
loop would involve module (c): add to J all program runs equivalent to the 
existing ones up to commutativity without having a proof for them. In the safety 
context, it is well-known that checking whether a program is subsumed by the 
commutativity closure of a proof is undecidable. We show (in Sect.3) that the 
same hurdle exists when doing inclusion checks for program termination. 

In the context of safety [16-19], program reductions were proposed as an 
antidote to this undecidability problem: rather than enlarging the proof, one 
reduces the program and verifies a new program with a subset of the original 
program runs while maintaining (at least) one representative for each commu- 
tativity equivalence class. These representatives are the lexicographically least 
members of their equivalence classes, and are algorithmically computed based 
on the idea of the sleep set algorithm [22] to construct the automaton for the 
reduced program. However, using this technique is not possible in termination 
reasoning where lassos, and not finite program runs, are the basic objects. 

To overcome this problem, we propose a different class of reductions, called 
finite-word reduction. Inspired by the classical result that an w-regular language 
can be faithfully captured as a finite-word language for the purposes of certain 


(b) 
no Program may 
not terminate. 


no (c) yes 


‘ Generalize IT Add 1 to proof II 


Program is terminating. 


(a) 


Does II subsume 
all program lassos? 


Fig. 2. Refinement Loop For Proving Termination. 
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checks such as inclusion checks [4], we propose a novel way of translating both the 
program and the proof into finite-word languages. The classical result is based on 
an exponentially sized construction and does not scale. We propose a polynomial 
construction that has the same properties for the purpose of our refinement loop. 
This contribution can be viewed as an efficient translation of termination analysis 
to safety analysis and is useful independent of the commutativity context. For the 
resulting finite-word languages, we propose a novel variation of the persistent set 
algorithm to reduce the finite-word program language. This reduction technique 
is aware of the lasso structure in finite words. 

Used together, finite-word reductions and omega-prefix generalization pro- 
vide an approximation of the undecidable commutativity-closure idea discussed 
above. They combine the idea of closures, from proof generalization schemes 
like [15] and reductions from safety [16], into one uniform proof rule that both 
reduces the program and generalizes the proof up to commutativity to take as 
much advantage as possible. Neither the reductions nor the generalizations are 
ideal, which is a necessity to maintain algorithmic computability. Yet, together, 
they can perform in a near optimal way in practice: for example, with 2 produc- 
ers and one consumer, the program is proved terminating by sampling precisely 
3 terminating lassos (1 for each thread) and 2 infeasible lassos (one for each 
barrier failure scenario). 

Finally, mostly out of theoretical interest, we explore a class of infinite word 
reductions that have the same theoretical properties as safety reductions, that is, 
they are optimal and their regularity (in this case, w-regularity) is guaranteed. 
We demonstrate that if one opts for the Foata Normal Form (FNF) instead 
of lexicographical normal form, one can construct optimal program reductions 
in the style of [16,18,19] for termination checking. To achieve this, we use the 
notion of the FNF of infinite words from (infinite) trace theory [13], and prove 
the w-regular analogue of the classical result for regular languages: a reduction 
consisting of only program runs in FNF is w-regular, optimal, and can be soundly 
proved terminating in place of the original program (Sect. 3). 

To summarize, this paper proposes a way of improving termination checking 
for concurrent programs by exploiting commutativity to boost existing algorith- 
mic verification techniques. We have implemented our proposed solution in a 
prototype termination checker for concurrent programs called TERMUTE, and 
present experimental results supporting the efficacy of the method in Sect. 6 


2 Preliminaries 


2.1 Concurrent Programs 


In this paper, programs are languages over an alphabet of program statements 
+). The control flow graph for a sequential program with a set of locations 
Loc, and distinct entry and exit locations, naturally defines a finite automaton 
(Loc, X, 6, entry, {exit}). Without loss of generality, we assume that this automa- 
ton is deterministic and has a single exit location. This automaton recognizes 
a language of finite-length words. This is the set of all syntactic program runs 
that may or may not correspond to an actual program execution. 
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For the purpose of termination analysis, we are also interested in 
infinite-length program runs. Given a deterministic finite automaton Az, = 
(Q, ©’, ô, do, F) with no dead states, where £(A;) = L C X* is a regular language 
of finite-length syntactic program runs, we define Büchi( Az) = (Q, X, ô, qo, Q), 
a Büchi automaton recognizing the language LY = {u € ©” : Vu € pref (u)v € 
pref (L)}, where pref(u) denotes {w € X* : dw’ € L*UL*.w-w’ = u} and 
pref (L) = U,er pref (v). These are all syntactic infinite program runs that may 
or may not correspond to an actual program execution. 

We represent concurrency via interleaving semantics. A concurrent program 
is a parallel composition of a fixed number of threads, where each thread is 
a sequential program. Each thread P; is recognized by an automaton Ap = 
(Loci, X;, ôi, entry;, {exit;}). We assume the X;’s are disjoint. The DFA recogniz- 
ing P = P|]... ||P, is constructed using the standard product construction for a 
DFA Ap recognizing the shuffle of the languages of the individual thread DFA’s. 

The language of infinite runs of this concurrent program, denoted P”, is the 
language recognized by Biichi(Ap). Note that a word in the language P” may 
not necessarily be the shuffle of infinite runs of its individual threads. 


PY = {fue X” | Ji: u 


s, EPY AVJ: uly, € pref(P;) U Pe} 


In the rest of the paper, we will simply write P when we mean P” for brevity. 
Note that P“ includes unfair program runs, for example those in which individual 
threads can be indefinitely starved. As argued in [15], this can be easily fixed by 
intersecting P” with the set of all fair runs. 


2.2 Termination 


Let X the domain of the program state, X a set of statements, and denote 
l] : 2* — P(X x X) a function which maps a sequence of statements to a 
relation over the program state, satisfying [s1][s2] = [51-2] for all s1, s2 E€ X*. 
Define sequential composition of relations in the usual fashion: rır = {(x,y) : 
dz.(a,z) € rı A (z,y) € ra}. We write s(x) to denote {y : (x,y) € [s]} C X. 

We say that an infinite sequence of statements 7 E€ X™ is infeasible if and 
only if Vr € X Jk € N 51...84(x) = 9, where s; is the ith statement in the 
run T. A program—an w-regular language P C XY—is terminating if all of its 
infinite runs are infeasible. 


Vr € P, T is infeasible 
P is terminating 


(TERM) 


Lassos. It is not possible to effectively represent all infinite program runs, but 
we can opt for a slightly more strict rule by restricting our attention to ultimately 
periodic runs UP C X“. That is, runs that are of the form uv” for some finite 
words u,v € X*. These are also typically called lassos. 

It is unsound to replace all runs with all ultimately periodic runs in rule 
TERM. P may be non-terminating while all its ultimately periodic runs are 
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terminating. Assume that our program P is an w-regular language and there is a 
universe 7 of known terminating programs that are all omega-regular languages. 
Then, we get the following sound rule instead: 


WeTPCH 
P is terminating 


(TERMUP) 


If the inclusion P C IT does not hold, then it is witnessed by an ultimately 
periodic run [4]. In a refinement loop in the style of Fig. 2, one can iteratively 
expand J based on this ultimately periodic witness (a.k.a. a lasso), and hence 
have a termination proof construction scheme in which ultimately periodic runs 
(lassos) are the only objects of interest. Note that if P includes unfair runs of a 
concurrent program, rather than fixing it, one can instead initialize JI with all 
the unfair runs of the concurrent program, which is an w-regular language. This 
way, the rule becomes a fair termination rule. 


2.3 Commutativity and Traces 


An independence (or commutativity) relation I C X x X is a symmetric, anti- 
reflexive relation that captures the commutativity of a program’s statements: 
($1, $2) E€ I = > [8182] = [s2s1]. In what follows, assume such an J is fixed. 


Finite Traces. Two finite words w, and wz are equivalent whenever we can 
apply a finite sequences of swaps of adjacent independent program statements to 
transform w1 into w2. Formally, an independence relation J on statements gives 
rise to an equivalence relation =; on words by defining =; to be the reflexive 
and transitive closure of the the relation ~z, defined as us,squ ~g usgsju <=> 
(1,52) € I. A Mazurkiewicz trace [u]; = {v € X* : v =; u} is the corresponding 
equivalence class; we use “trace” exclusively to denote Mazurkiewicz traces. 


Infinite Traces. Traces may also be defined in terms of dependence graphs (or 
partial orders). Given a word T = $159..., the dependence graph corresponding 
to T is a labelled, directed, acyclic graph G = (V, E) with labelling function 


L: V — X and vertices V = {1,2,...}, ON a 


where L(i) = si, and (i,i’) € E whenever © 
i < i' and (L(t), L(i')) g I. Then, [7]7, the ca 
equivalence class of the infinite word 7, is pre- () 


cisely the set of linear extensions of G. There- (i) 
fore, 7’ =; T iff 7’ is a linear extension of G. © ‘ Ze 
For example, Fig. 3(i) illustrates the Hasse ON © © 
diagram of the finite trace [abcba];, and © =e 
Fig. 3(ii), the Hasse diagram of the infinite © . © O; 
trace [abc(ab)”]3°, where I = {(a, b), (b, a)}. (ii) 
For an infinite word 7, the infinite trace 


T|X may contain linear extensions that do not Fig. 3. Hasse diagrams. 
I y 
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correspond to any word in &“. For example, if J = { (a, b), (b,a)}, then the trace 
[(ab)*]7° includes a member (infinite word) in which all as appear before all bs. 
We use a”b” to denote this word and call such words transfinite. This means 
that [r]? Z X”, even for an ultimately periodic r. 


Normal Forms. A trace, as an equivalence class, may be represented unam- 
biguously by one of its member words. Lexicographical normal forms [13] are the 
most commonly used normal forms, and the basis for the commonly known sleep 
set algorithm in partial order reduction [22]. Foata Normal Forms (FNF) are 
less well-known and are used in the technical development of this paper: 


Definition 1 (Foata Normal Form of a finite trace [13]). For a finite trace 
t, define FNF(t) as a sequence of sets S1 S2...Sp (for some k € N) where t = TE S; 
and for all i: 


Va,b E S; a#b = (a,b) El (no dependencies in S; ) 
Vb € Si41 Ja € S; (a,b) Z I ( Si dependent on S41 ) 


Given a trace’s dependence graphs, the FNF can be constructed by repeat- 
edly removing sets of minimal elements, that is, sets with no incoming edges. 
Although we have defined a trace’s FNF as a sequence of sets, we will generally 
refer to a trace’s FNF as a word in which the elements in each set are assumed 
to be ordered lexicographically. For example, FNF([abcba];) = ab - c - ab, where 
I = {(a,b),(b,a)}. We overload this notation by writing FNF([u]7) as FNF(u), 
and, for a language L, FNF(L) = {FNF (u) : u € L}. 


Theorem 1 ((13]). L is a regular language iff the set of its Foata (respectively 
Lexicographical) normal forms is a regular language. 


3 Closures and Reductions 


Commutativity defines an equivalence relation =; which preserves the termina- 
tion of a program run. 


Proposition 1. ForrT, T € 3” and T' =, T, T is terminating iff T' is terminat- 
ing. 


In the context of a refinement loop in the style of Fig. 2, Proposition 1 suggests 
one can take advantage of commutativity by including all runs that are equivalent 
to the ones in IT (which are already proved terminating) in module (c). We 
formally discuss this strategy next. 

Given a language L and an independence relation J, define [L] = Urex|T]?°. 
Recall from Sect. 2 that, in general, [r] Z X“. Since programs are represented 
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by w-regular languages in our formalism, it is safe for us to exclude transfinite 
words from [7]?° from commutativity closures computation. Define: 

LIF = User [TIP NL” (w-closure) 


The following sound proof rule is a straightforward adaptation of Rule 
TERMUP that takes advantage of commutativity-based proof generalization: 


A C T.P C [I]? 
P is terminating 


(TERMCLOSURE) 


Recall the example from Sect. 1 with two producers. The transfinite program 
run AY A9 that is the sequential compositions of the two producers looping forever 
back to back does not belong to the w-closure of any w-regular language. We 
generalize the notion of w-closure to incorporate the idea of such runs in a new 
proof rule. 

Let 7 a transfinite word (like a”b®). Let 7’ a prefix of 7. If |7’| = w, we 
say that 7’ is an w-prefix of T, or T’ € pref „ (T). A direct definition for when a 
transfinite word 7 is terminating would be rather contrived, since a word such as 
a”’b” does not correspond to a program execution in the usual sense. However, 
a very useful property arises when considering the w-words of pref .,(7): If an 
w-prefix T’ of a transfinite word 7 is terminating, then all words in [r]¥ are 
terminating. 


'| 


Theorem 2 (Omega Prefix Proof Rule). Let 7,7’ € 3,7 a transfinite 
word, if T =r T” and T' € pref (T), T! terminates > T” terminates. 


Remark that [7]? C X“, so the former theorem uses the usual definition of 
termination, i.e. termination of w-words; however; this theorem implicitly defines 
a notion of termination for some transfinite words. 

Define [r]}”, the omega-prefix closure of T as 


tT! .repref .,(T’) 
Theorem 2 guarantees that, if 7 terminates, then all of [r]}” terminates. The 


converse, however, does not necessarily hold: [r];” is not an equivalence class. 


Example 1. Continuing the example in Fig.1, recall that A, and Ag are inde- 
pendent. Let us assume we have a proof that AY is terminating. The class 
[AY]? = {AV} does not include any other members and therefore we cannot con- 
clude the termination status of any other program runs based on it. On the other 
hand, since A¥ € pref (AYAY) and [(A1A2)“]2 = [APAS]¥, (ArAg)” € [AP]F?. 
Therefore, we can conclude that (A;\2)” is also terminating. Note that Az can 
be non-terminating and the argument still stands. 


One can replace the closure in Rule TERMCLOSURE with omega-prefix closure 
and produce a new, more powerful, sound proof rule. There is, however, a major 
obstacle in the way of an algorithmic implementation of Rule TERMCLOSURE 
with either closure scheme: the inclusion check in the premise is not decidable. 


Commutativity for Concurrent Program Termination Proofs 117 


Proposition 2. [L]? and [L]}" for an w-regular language L may not be w- 
regular. Moreover, it is undecidable to check the inclusions Ly C [La]? and 
Lı C [L2]}” for w-regular languages Lı and Lə. 


3.1 The Compromise: A New Proof Rule 


In the context of safety verification, with an analogous problem, a dual approach 
was proposed as a way forward [18] based on program reductions. 


Definition 2 (w-Reduction and wp-Reduction). A language R C P is an 
w-reduction (resp. wp-reduction of P) of program P under independence relation 
I iff for all T € P there is some T’ € R such that T € [r'|¥ (resp. T € [r'JP”). 


The idea is that a program reduction can be soundly proven in place of the 
original program but, with strictly fewer behaviours to prove correct, less work 
has to be done by the prover. 


Proposition 3. Let P be a concurrent program and II be w-regular. We have: 


- PC [LH]? iff there exists an w-reduction R of P under I such that RC I. 
- P C [I] iff there exists an wp-reduction R of P under I such that RC II. 


An w/wp-reduction R may not always be w-regular. However, Proposition 3 
puts forward a way for us to make a compromise to rule TERMCLOSURE for the 
sake of algorithmic implementability. Consider a universe of program reductions 
Red( P), which does not include all reductions. This gives us a new proof rule: 


JH € T.AR € Red(P).R C H 


TERMREDUC 
P is terminating ( ) 


If Red(P) is the set of all w-reductions (resp. wp-reductions), then Rule 
TERMREDUC becomes logically equivalent to Rule TERMCLOSURE (resp. with 
[I\;”). By choosing a strict subset of all reductions for Red(P), we trade the 
undecidable premise check of the proof rule TERMCLOSURE with a new decidable 
premise check for Rule TERMREDUC. The specific algorithmic problem that this 
paper solves is then the following: What are good candidates for Red(P) such 
that an effective and efficient algorithmic implementation of Rule TERMREDUC 
exists? Moreover, we want this implementation to show significant advantages 
over the existing algorithms that implement the Rule TERMUP. 

In Sect.5, we propose Foata Reduction as a theoretically clean option for 
Red(P) in the universe of all w-reductions. In particular, they have the algo- 
rithmically essential property that the reductions do not include any transfinite 
words. In the universe of wp-reductions, which does account for transfinite words, 
such a theoretically clean notion does not exist. This paper instead proposes the 
idea of mixing both closures and reductions as a best algorithmic solution for the 
undecidable Rule TERMCLOSURE in the form of the following new proof rule: 
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JH C T.AR € Red(P).R C [H] 
P is terminating 


(TERMOP) 


In Sect. 3.2, we introduce [IZ]? as an underapproximation of |I]}” that is 
guaranteed to be w-regular and computable. Then, in Sect. 4, we discuss how, 
through a representation shift from infinite words to finite words, an appropriate 
class of reductions for Red(P) can be defined and computed. 


3.2 Omega Prefix Generalization 


We can implement the underapproximation of [JI]?” by generalizing the proof 
of termination of each individual lasso in the refinement loop of Fig.2. Let 
U1, Um, U1; -Um E X and consider the lasso uv’, where u = U4...Um,U = 
v1.. Um, and m’ > 0. Let Awe = (Q, X, ô, qo, {qm}) a Biichi automaton consist- 
ing of a stem and a loop, with a single accepting state qm at the head of the 
loop, recognizing the ultimately periodic word uv“—in [25], this automaton is 
called a lasso module of uv”. Let X7,,,, E X = {a : {v1, 0w} x {a} C I} 
the statements that are independent with the statements v1, ..., Um of the loop, 
and Vhaem E Xhoop = {4 : {U1,+++)UmsV1,-++,Um'} X {a} C I} the statements 
that are independent of all statements appearing in uv“. 
Define OPG(A,) = (QU {q'}, X, dopa, qo, {qm }) for a lasso T = uv” where 


q if q € {q0; Gent} AGE XTsrem 
or if q € {qm+1; =; Imtm JU {G} Aa E Ži Tison 
dopg(q,a) = 4 q' if q = qm ^a E€ Shoop Or M = 1 and a = v: 
O(dm,¥1) ifq=qg ^a =v 
ô(q, a) O.w. 


We refer to the language L(OPG(A,)) recognized by this automaton as [r]? 


for short. Note that this construction is given for individual lassos; we may 
generalize this to a (finite) set of lassos by simply taking their union. For a lasso 
T = uv", OPG(A,) is a linearly-sized Biichi automaton whose language satisfies 
the following: 


Proposition 4. [r]? c [7]f”. 


Intuitively, this holds because this automaton simply allows us to inter- 
sperse the statements of uv“ with independent statements; when considering the 
Mazurkiewicz trace arising from a word interspersed as described, these added 
independent statements may all be ordered after uv”, resulting in a transfinite 
word with w-prefix wv”. 


op 


Theorem 3. If 7 is terminating, then every run in |r]? is terminating. 


This follows directly from Theorem 2 and Proposition4, and concludes the 
soundness and algorithmic implementability of Rule TERMOP if Red(P) = {P}. 
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4 Finite-Word Reductions 


In this section, inspired by the program reductions used in safety verification, we 
propose a way of using those families of reductions to implement Red(P) in Rule 
TERMREDUC. This method can be viewed as a way of translating the liveness 
problem into an equivalent safety problem. 

In [4], a finite-word encoding of w-regular languages was proposed that can 
be soundly used for checking inclusion in the premise of rules such as Rule 
TTERMREDUC: 


Definition 3 ($-language [4]). Let L € ©”. Define the $-language of L as 
$(L) = {u$v| u,v E€ &* A uv” E€ L}. 


If L is w-regular, then $(L) is regular [4]. This is proved by construction, but 
the one given in [4] is exponential. Since the Biichi automaton for a concurrent 
program P is already large, an exponential blowup to construct $(P) can hardly 
be tolerated. We propose an alternative polynomial construction. 


4.1 Efficient Reduction to Safety 


Our polynomial construction, denoted by fast$, consists of linearly many copies 
of the Biichi automaton recognizing the program language. 


Definition 4 (fast$). Given a Biichi automaton A = (Q,2,6,q0,F), define 
fast$( A) E (Qs, X U {$}, ôs, qo, Fs) with Qs Q U (Q x Q x {0,1}), Fg = 
{(4,4,1):q E Q}, and for q,r € Q, i € {0,1}: 


sanf fo" 
( 
( 


{(q,r',1): r" € ô(r,a)} ifi=0 andre F 
'i):r' E ô(r,a)} o.w. 


Let L be an w-regular language and A be a Büchi automaton recognizing L. 
We overload the notation and use fast$(L) to denote the language recognized by 
fast$(A). Note that fast$(Z), unlike $(L), is a construction parametric on the 
Büchi automaton recognizing the language, rather than the language itself. In 
general, fast$(L) under-approximates $(Z). But, under the assumption that all 
alphabet symbols of X label at most one transition in the Biichi automaton A 
(recognizing L), then fast$(L) = $(L). This condition is satisfied for any Biichi 
automaton that is constructed from the control flow graph of a (concurrent) 
program since we may treat each statement appearing on the graph as unique, 
and these graph edges correspond to the transitions of the automaton. 


Theorem 4. For any w-regular language L, we have fast$(L) C $(L). If P is a 
concurrent program then fast$(P) = $(P). 
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First, let us observe that in Rule TERMUP, we can replace P with fast$(P) 
and IT with fast$(I7) (and hence the universe 7 with a correspondingly appro- 
priate universe) and derive a new sound rule. 


Theorem 5. The finite word version of Rule TERMUP using fast$ is sound. 


The proof of Theorem 5 follows from Theorem 4. Using fast$, the program is 
precisely represented and the proof is under-approxiamted, therefore the inclu- 
sion check implies the termination of the program. 


4.2 Sound Finite Word Reductions 


With a finite word version of the Rule TERMUP, the natural question arises 
if one can adopt the analogue of the sound proof rule used for safety [18] by 
introducing an appropriate class of reductions for program termination in the 
following proof rule: 


A € T.AR € Red($(P)).R C fast$(IZ) 
P is terminating 


(FINITETERMREDUC) 


A language R is a sound reduction of $(P) if the termination of all ultimately 
periodic words uv”, where u$v € R, implies the termination of all ultimately 
periodic words of P. Since, in u$v, the word u represents the stem of a lasso and 
the word v represents its loop, it is natural to define equivalence, considering 
the two parts separately, that is: u$v =; u’$v' iff u’ =; u Av’ =; v. One can use 
any technique for producing reductions for safety, for example sleep sets for lexi- 
cographical reductions [18], in order to produce a sound reduction that includes 
representatives from this equivalence relation. Assume that $ does not commute 
with any other letter in an extension Ig of I over X U {$} and observe that the 
standard finite-length word Mazurkiewicz equivalence relation of u$v =z, u’$v’ 
coincides with u$v =; u’$v’ as defined above. Let FRed($(P)) be the set of all 
such reductions. An algorithmic implementation of Rule FINITETERMREDUC 
with Red($(P)) = FRed($(P)) may then be taken straightforwardly from the 
safety literature. 

Note, however, that reductions in FRed($(P)) are more restrictive than their 
infinite analogues; for example, uv$v ¢ [u$v]7, whereas uvv“ = uv” and there- 
fore uvv” =; uv” for any I. By treating $(P)’s $-word as a a finite word without 
recognizing its underlying lasso structure, every word wv” in the program neces- 
sarily engenders an infinite family of representatives in R—one for each $-word 
{uSv, uv$v, usvv, ...} C $(P) corresponding to uv” € P. 

We define dollar closure as variant of classic closure that is sensitive to the 
termination equivalence of the corresponding infinite words: 


fusu] = {x$y : uv” € [xy ]F"} 


The termination of uv” is implied by the termination of any ry” such that xr$y is 
a member of [u$v]* (see Theorem 2). However, the converse does not necessarily 
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hold. Therefore, like omega-prefix closure, [u$v]$ is not an equivalence class. It 
suggests a more relaxed condition (than the one used for FRed($(P))) for the 
soundness of a reduction: 


Definition 5 (Sound $-Program Reduction). A language R C P is called 
a sound $-program reduction of $(P) under independence relation I iff for all 
uv” € P we have [u$v]$ N R £ b. 


A $-reduction R satisfying the above condition is obviously sound: It must 
contain a $-representative $y € [u$v]* for each word wv” in the program. If R 
is terminating, then xy” is terminating, and therefore so is wv”. Moreover, these 
sound $-program reductions can be quite parsimonious, since one word can be 
an omega-prefix corresponding to many classes of program behaviours. 

Under this soundness condition, we may now include one representative of 
[u$v]* for each uv” € P in a sound reduction of P. For example, R = {$a, $b} 
is a sound $-program reduction of P = a”||b” when (a,b) € I. To illustrate, 
note that the only traces of P are the three depicted as Hasse diagrams in Fig. 4; 
the distinct program words (ab)”, (aba)”, (abaa)”, ... all correspond to the same 
infinite trace shown in Fig. 4(iii). A salient feature of Fig. 4(iii) is that a” and bY 
correspond to disconnected components of this dependence graph. The omega- 
prefix rule of Theorem 2 can be interpreted in this graphical context as follows: 
if any connected component of the trace is terminating, then the entire class is 
terminating. 


Fig. 4. The only three traces in P = a” ||b” when (a,b) € I. 


Recall that module (d) of the refinement loop of Fig.2 may naturally be 
implemented as the inclusion check P C IJ, or one of its variations that appear 
in the proof rules proposed throughout this paper. In a typical inclusion check, a 
product of the program and the complement of the proof automata are explored 
for the reachability of an accept state. Therefore, classic reduction techniques 
that operate on the program by pruning transitions/states during this explo- 
ration are highly desirable in this context. We propose a repurposing of such 
techniques that shares the simplicity and efficiency of constructing reductions 
from FRed($(P))) (in the style of safety) and yet takes advantage of the weaker 
soundness condition in Definition 5 and performs a more aggressive reduction. 
In short, a reduced program may be produced by pruning transitions while per- 
forming an on-the-fly exploration of the program automaton. In pruning, our 
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goal is to discard transitions that would necessarily form words whose suffixes 
lead us into the disconnected components of the program traces underlying the 
program words that have been explored so far. This selective pruning technique 
is provided by a straightforward adaptation of the well-known safety reduction 
technique of persistent sets [22]. Consider the program illustrated in Fig. 5(a). In 
the graph in Fig. 5(b), the green states are explored and the dashed transitions 
are pruned. This amounts to proving two lassos terminating in the refinement 
loop of Fig.2, where each lasso corresponds to one connected component of a 
program trace. 


1 while x < z: 1 while y < z: 
2 x++ 2 y++ 
3 end 3 end 

(a) 


Fig. 5. Example of persistent set selective search. 


We compute persistent sets using a variation of Algorithm 1 in Chap. 4 of 
[22]. In brief, a € Persistent.(q) if a is the lexicographically least enabled 
state at q according to thread order <, if a is an enabled statement from 
the same thread as another state- 
ment a’ € Persistent.(q), or if 
a is dependent on some statement _“ eT "eee 
a’ € Persistent.(q) from a dif Input: fast$(Ap) = (Q, X, 6, qo, F) 
ferent thread than a. In addition, Output: «Sy 
$ is also persistent whenever it is + He 0,5 — {(qo, ®)} 

i 2 while (q, w) = S.pop() do 

enabled. This set may be computed 3 if q ¢ H then 

via a fixed-point algorithm; when- , if q € F then 

ever a statement that is not enabled — % | return w 

is added to Persistent <(q), then 6 for a € X N Persistent(q) do 

Persistent.(q) is simply the set of 7 | S.push(6(q, a), w - a) 
8 
9 


Algorithm 1: PERSISTENTSS 


all enabled states. Intuitively, this H — HU {q} 

procedure works because transitions return “EMPTY” 

are ignored only when they are nec- 

essarily independent from all the statements that will be explored imminently; 
these may soundly be ignored indefinitely or deferred. Transitions that are 
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deferred indefinitely are precisely those that would lead into a disconnected 
component of a program traces. 

The reduced program that arises from the persistent set selective search 
of fast$(Ap) based on thread order < is denoted by PersistentSS_.($(P)). 
Figure 5(b) illustrates a reduced program; note that $-transitions are omitted 
for simplicity. The reduced program corresponds to the states shown in green. 
The other program states are unreachable because the only persistent transitions 
correspond to statements from the least enabled thread; the transitions shown 
with dashed lines are not persistent. 


Theorem 6 (soundness of finite word reductions). Rule 
FINITETERMREDUC is a sound proof rule when Red($(P)) = {V <: 
PersistentSS_($(P))}. 


The theorem holds under the condition that the set 7 from Rule 
FINITETERMREDUC is the set of all terminating w-regular languages, and the 
under the assumption that the program is fair (or, equivalently, that the proof 
includes the unfair runs of P, as discussed in Sect. 2.2), where a fair run is one 
where no enabled thread action is indefinitely deferred. The proof of soundness 
appears in the extended version of this paper [31]. Intuitively, it relies on the 
fact that PersistentSS.($(P)) is a $-program reduction for all the fair runs in 
P. 


Example 2. Recall the producer-consumer in Fig. 1, and consider the program 
with two producers P, and P> and one consumer C. Let A; denote the loop body 
of Pı, and Az that of P2. Concretely, A; = [i < producer_limit] ; C++ ; i++ 
where [...] is an assume statement, and similarly for Aj. In addition, each 
loop has an exit statement, which we denote by 1; and t2. For instance, t1 = 
[i >= producer_limit]. Let < such that Pi < Pp < C. 

In A = PersistentSS_($(P)), Pı is the first thread and therefore persistent; 
that is, the word $\;—the $-word corresponding to A¥ — is in the reduction. Since 
A, is independent of all statements in P> and C, any run in which P; enters the 
loop (and does not exit via 4,) will not be included in the reduction. In effect, 
this means that AY is the only representative of [AY]7° = [AY]? UAY - (Pa +C)”]Y 
in the program reduction. 

Even though P> seems identical to P,, the same is not true for P> because 
it appears later in the thread order. In this case, [A2|7” is represented by the 
family of words (A1)*14A9. 


5 Omega Regular Reductions 


In the classic implementation of Rule TERMUP [25], w-regular languages are 
used to represent the program P and the proof I. It is therefore natural to ask 
if Red(P) in Rule TERMREDUC can be a family of w-regular program reductions. 
For finite program reductions [16-19], and also for classic POR, lexicographical 
normal forms are almost always the choice. Infinite traces have lexicographic 
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normal forms that are analogous to their finite counterparts [13]. However, these 
normal forms are not suitable for defining Red(P). For example, if (a,b) € J, 
then the lexicographic normal form of the trace [(ab)”]° is a“b® if a < b or 
ba” otherwise; both transfinite words. Fortunately, Foata normal forms do not 
share the same problem. 


Definition 6 (Foata Normal Form of an infinite trace[13]). Foata Normal 
Form FNF(t) of an infinite trace t is a sequence of non-empty sets S1S2... such 
that t = Ilicu Si and for all i: 


Va,b E€ S; a#b = (a,b) EI (no dependencies in S; ) 
Vb € Si}1ı Ja € S; (a,b) Z I ( Si dependent on Sizi ) 


For example, FNF([(ab)“]?°) = (ab)” if (a,b) € I. To define a reduction 
based on FNF, we need a mild assumption about the program language. 


Definition 7 (Closedness). A language L C X” is closed under the indepen- 
dence relation I iff [L] C L and is w-closed under I iff [L]? C L. 


It is straightforward to see that any concurrent program P (as defined in 
Sect.2.1), and any valid dependence relation IJ, we have that P is w-closed. 
This means that for any (infinite) program run 7, any other w-word 7 that is 
equivalent to 7 is also in the language of the program. 

The key result that makes Foata normal forms amenable to automation in 
the automaton-theoretic framework is the following theorem. 


Theorem 7. If LC X” is w-regular and closed, FNF(L) is w-regular. 


The proof of this theorem provides a construction for the Büchi automa- 
ton that recognizes the language FNF(L); see [31] for more detail. However, 
this construction is not efficient since, for a program P, of size O(n), the Biichi 
automaton recognizing FNF(P) can be as large as O(n2”). Finally, Foata reduc- 
tions are minimal in the same exact sense that lexicographical reductions of 
finite-word languages are minimal: 


Theorem 8 [Theorem 11.2.15 [13]/. If L C X” is w-regular and closed, then 
for allr € L, ' € FNF(L) A [T]? = r =r. 


Our experimental results in Sect. 6 suggest that this complexity is a big bot- 
tleneck in practical benchmarks. Therefore, despite the fact that Foata nor- 
mal forms put forward an algorithmic solution for the implementation of Rule 
TERMREDUCTERMREDUC, the inefficiency of the solution makes it unsuitable 
for practical termination checkers. 
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6 Experimental Results 


The techniques presented in this paper have been implemented in a prototype 
tool called TERMUTE written in Python and C++. The inputs are concurrent 
integer programs written in a C-like language. TERMUTE may output “Termi- 
nating”, or “Unknown”, in the latter case also returning a lasso whose termina- 
tion could not be proved. Ranking functions and invariants are produced using 
the method described in [24], which is restricted to linear ranking functions of 
linear lassos. Interpolants are generated using SMTInterpol [6] and MathSAT 
[7]; the validity of Hoare triples are checked using CVC4 [2]. 

TERMUTE may be run in several different modes. FOATA is an imple- 
mentation of the algorithm described in Sect.5. The baseline is the core 
counterexample-guided refinement algorithm of [25], which has been adapted 
to the finite-word context in order to operate on the automata fast$(P) and 
fast$(IT) of Sect.4.1. All other modes are modifications of this baseline, main- 
taining the same refinement scheme, so that we can isolate the impact of adding 
commutativity reasoning. Hoare triple generalization (“HGen”) augments the 
baseline by making solver calls after each refinement round in order to deter- 
mine if edges may soundly be added to the proof for any valid Hoare triples 
not produced as part of the original proof. “POR” implements the persistent set 
technique of Sect. 4.2 and “OPG” is the finite-word analogue of the w-prefix gen- 
eralization in Sect. 3.2. TERMUTE can also be run on any combinations of these 
techniques. In what follows, we use TERMUTE to refer to the portfolio winner 
among all algorithms that employ commutativity reasoning, namely POR, OPG, 
POR + HGen, POR + OPG, and POR + OPG + HGen. 

See [31] for more detail regarding our experimental setup and results. 


Benchmarks. Our benchmarks include 114 terminating concurrent linear inte- 
ger programs that range from 2 to 12 threads and cover a variety of patterns 
commonly used for synchronization, including the use of locks, barriers, and 
monitors. Some are drawn from the literature on termination verification of 
concurrent programs, specifically [29,34,37], and the rest were created by us, 
some of which are based on sequential benchmarks from The Termination Prob- 
lem Database [38], modified to be multi-threaded. We include programs whose 
threads display a wide range of independence—from complete independence (e.g. 
the producer threads in Fig. 1), all the way to complete dependence—and demon- 
strate a range of complexity with respect to control flow. 


Results. Our experiments have a timeout of 300s and a memory cap of 32 
GB, and were run on a 12th Gen Intel Core i7-12700K with 64 GB of RAM 
running Ubuntu 22.04. We experimented with both interpolating solvers and 
the reported times correspond to the winner of the two. The results are depicted 
in Fig. 6(a) as a quantile plot that compares the algorithms. The total number 
of benchmarks solved is noted on each curve. FOATA times out on all but the 
simplest benchmarks, and therefore is omitted from the plot. 
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The portfolio winner, TERMUTE, solves 101 benchmarks in total. It solves 
any benchmark otherwise solved by algorithms without commutativity reasoning 
(namely, the baseline or HGen). It is also faster on 95 out of 101 benchmarks 
it solves. The figure below illustrates how often each of the portfolio algorithms 
emerges as the fastest among these 95 benchmarks. 


E POR E POR+OPG E POR+HGen E POR+OPG+HGen E OPG 


0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 


HGen aggressively generalizes the proof and consequently, it forces convergence 
in many fewer refinement rounds. This, however, comes at the cost of a time 
overhead per round. Therefore, HGen helps in solving more benchmarks, but 
whenever a benchmarks is solvable without it, it is solved much faster. The 
scatter plot in Fig. 6(b) illustrates this phenomenon when HGen is added to 
POR+OPG. The plot compares the times of benchmarks solved by both algo- 
rithms on a logarithmic scale, and the overhead caused by HGen is significant 
in the majority of the cases. 


300 1000 
© Baseline 1 81 
© OPG 
œ HGen . i 
æ POR A : é 
œ POR+OPG 2 100 e 
225 POR +HGen S e 
‘@ POR + OPG + HGen 3 s.’ 
2 . 
T J 
$ È ve 
f] E w .’ A 
8 150 3 oo 
EF: 
Ë g ove ° 
fo) o 
+ e o% 
75 č 1 we 
2 p° 
Gi 
o ó 
ø 
0 01 © 
0 10 20 30 40 50 60 70 80 90 100 0.1 1 10 100 1000 
Number of Benchmarks Solved POR+OPG Time (seconds) 
(a) (b) 


Fig. 6. Experimental results for TERMUTE: (a) quantile plot for the throughput of 
each algorithm, and (b) scatter plot for the impact of thread order on efficiency. 


Recall, from Sect.4, that the persistent set algorithm is parametrized on an 
order over the participating threads. The choice of order centrally affects the 
way the persistent set algorithm works, by influencing which transitions may 
be explored and, by extension, which words appear in the reduced program. 
Experimentally, we have observed that the chosen order plays a significant role 
in how well the algorithms work, but to varying degrees. For instance, for POR, 
the worst thread order times out on 16% of the benchmarks that the best order 
solves. For POR+OPG+HGen, the difference is more modest at 7%. In practice, 
it is sensible then to instantiate a few instances of the TERMUTE with a few 
different random orders to increase the chances of getting better performance. 
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7 Related Work 


The contribution of this paper builds upon sequential program termination 
provers to produce termination proofs for concurrent programs. As such, any 
progress in the state of the art in sequential program termination can be used 
to produce proofs for more lassos, and is, therefore, complementary to our app- 
roach. So, we only position this paper in the context of algorithmic concurrent 
program termination, and the use of commutativity for verification in general, 
and skip the rich literature on sequential program termination [11,36] or model 
checking liveness [8,9,26,33]. 


Concurrent Program Termination. The thread-modular approach to prov- 
ing termination of concurrent programs [10,34,35,37] aims to prove a thread’s 
termination without reasoning directly about its interactions with other threads, 
but rather by inferring facts about the thread’s environment. In [37], this app- 
roach is combined with compositional reasoning about termination arguments. 
Our technique can also be viewed as modular in the sense that lassos — which, 
like isolated threads, are effectively sequential programs — are dealt with inde- 
pendently of the broader program in which they appear; however, this is distinct 
from thread-modularity insofar as we reason directly about behaviours arising 
from the interaction of threads. Whenever a thread-modular termination proof 
can be automatically generated for the program, that proof is the most efficient 
in terms of scalability with the number of threads. However, for a thread-modular 
proof to always exist, local thread states have to be exposed as auxiliary infor- 
mation. The modularity in our technique does not rely on this information at 
all. Commutativity can be viewed as a way of observing and taking advantage 
of some degree of non-interference, different from that of thread modularity. 

Causal dependence [29] presents an abstraction refinement scheme for prov- 
ing concurrent programs terminating that takes advantage of the equivalence 
between certain classes of program runs. These classes of runs are determined 
by partial orders that capture the causal dependencies between transitions, in a 
manner reminiscent of the commutativity-based partial orders of Mazurkiewicz 
traces. The key to scalability of this method is that they forgo a containment 
check in the style of module (d) of Fig. 2. Instead, they cover the space of program 
behaviour by splitting it into cases. Therefore, for the producer-only instance of 
the example in Sect. 1, this method can scale to many many thread easily, while 
our commutativity-based technique cannot. Similar to thread-modular approach, 
this technique cannot be beaten in scalability for the programs that can be split 
into linearly many cases. However, there is no guarantee (none given in [29]), that 
a bounded complete causal trace tableau for a terminating program must exist; 
for example, when there is a dependency between loops in different threads that 
would cause the program to produce unboundedly many (Mazurkiewicz) traces 
that have to be analyzed for termination. The advantage of our method is that, 
once consumers are added to the example in Sect. 1, we can still take advantage 
of all the existing commutativity to gain more efficiency. 
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Similar to safety verification, context bounding [3] has been used as a way of 
under-approximating concurrent programs for termination analysis as well. 


Commutativity in Verification. Program reductions have been used as a 
means of simplifying proofs of concurrent and distributed programs before. Lip- 
ton’s movers [32] have been used to simplify programs for verification. CIVL 
[27,28] uses a combination of abstraction and reduction to produce layered 
programs; in an interactive setup, the programmer can prove that an imple- 
mentation satisfies a specification by moving through these layered programs 
to increasingly more abstract programs. In the context of message-passing dis- 
tributed systems [12,21], commutativity is used to produce a synchronous (rather 
than sequential) program with a simpler proof of correctness. 

In [16-19] program reductions are used in a refinement loop in the same style 
as this paper to prove safety properties of concurrent programs. In [18,19], an 
unbounded class of lexicographical reductions are enumerated with the purpose 
of finding a simple proof for at least one of the reductions; the thesis being that 
there can be a significant variation in the simplicity of the proof for two different 
reductions. In [19], the idea of contextual commutativity—i.e. considering two 
statements commutative in some context yet not all contexts—is introduced and 
algorithmically implemented. In [16,17], only one reduction at a time is explored, 
in the same style as this paper. In [16], a persistent-set-based algorithm is used 
to produce space-efficient reductions. In [17] the idea of abstract commutativity 
is explored. It is shown that no best abstraction exists that provides a maximal 
amount of commutativity and, therefore, the paper proposes a way to combine 
the benefits of different commutativity relations in one verification algorithm. 
The algorithm in this paper can theoretically take advantage of all of these 
(orthogonal) findings to further increase the impact of commutativity in proving 
termination. 


Non-termination. The problem of detecting non-termination has also been 
directly studied [1,5,20,23,30]. Presently, our technique does not accommodate 
proving the non-termination of a program. However, it is relatively straightfor- 
ward to adapt any such technique (or directly use one of these tools) to accommo- 
date this; in particular, when we fail to find a termination proof for a particular 
lasso, sequential methods for proving non-termination may be employed to deter- 
mine if the lasso is truly a non-termination witness. However, it is important to 
note that a program may be non-terminating while all its lassos are terminat- 
ing, and the refinement loop in Fig.2 may just diverge without producing a 
counterexample in this style; this is a fundamental weakness of using lassos as 
modules to prove termination of programs. 


8 Conclusion 


In the literature on the usage of commutativity in safety verification, sound pro- 
gram reductions are constructed by selecting lexicographical normal forms of 
equivalence classes of concurrent program runs. These are not directly applica- 
ble in the construction of sound program reductions for termination checking, 
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since the lexicographical normal forms of infinite traces may not be w-words. 
In this paper, we take this apparent shortcoming and turn it into an effective 
solution. First, these transfinite words are used in the design of the omega prefix 
proof rule (Theorem 2). They also inform the design of the termination aware 
persistent set algorithm described in Sect. 4.2. Overall, this paper contributes 
mechanisms for using commutativity-based reasoning in termination checking, 
and demonstrates that, using these mechanisms, one can efficiently check the 
termination of concurrent programs. 
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Abstract. Petri nets are an established model of concurrency. A Petri 
net is terminating if for every initial marking there is a uniform bound 
on the length of all possible runs. Recent work on the termination of 
Petri nets suggests that, in general, practical models should terminate 
fast, i.e. in polynomial time. In this paper we focus on the termination 
of workflow nets, an established variant of Petri nets used for modelling 
business processes. We partially confirm the intuition on fast termina- 
tion by showing a dichotomy: workflow nets are either non-terminating 
or they terminate in linear time. 

The central problem for workflow nets is to verify a correctness notion 
called soundness. In this paper we are interested in generalised soundness 
which, unlike other variants of soundness, preserves desirable properties 
like composition. We prove that verifying generalised soundness is coNP- 
complete for terminating workflow nets. 

In general the problem is PSPACE-complete, thus intractable. We 
utilize insights from the coNP upper bound to implement a procedure 
for generalised soundness using MILP solvers. Our novel approach is 
a semi-procedure in general, but is complete on the rich class of ter- 
minating workflow nets, which contains around 90% of benchmarks in 
a widely-used benchmark suite. The previous state-of-the-art approach 
for the problem is a different semi-procedure which is complete on the 
incomparable class of so-called free-choice workflow nets, thus our imple- 
mentation improves on and complements the state-of-the-art. 

Lastly, we analyse a variant of termination time that allows paral- 
lelism. This is a natural extension, as workflow nets are a concurrent 
model by design, but the prior termination time analysis assumes sequen- 
tial behavior of the workflow net. The sequential and parallel termination 
times can be seen as upper and lower bounds on the time a process rep- 
resented as a workflow net needs to be executed. In our experimental 
section we show that on some benchmarks the two bounds differ signif- 
icantly, which agrees with the intuition that parallelism is inherent to 
workflow nets. 


Keywords: Workflow - Soundness - Fast termination - generalised 
Soundness - Polynomial time 
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1 Introduction 


Petri nets are a popular formalism to model problem in software verification [22], 
business processes [1] and many more (see [42] for a survey). One of the funda- 
mental problems for such models is the termination problem, i.e. whether the 
lengths of all runs are universally bounded. There are two natural variants of this 
problem. First, if the initial configuration is fixed then the problem is effectively 
equivalent to the boundedness problem, known to be EXPSPACE-complete for 
Petri nets [36,41]. Second, if termination must hold for all initial configurations 
the problem known to be in polynomial time [30], and such nets are known as 
structurally terminating. In this paper we are interested in the latter variant. 

Termination time is usually studied for vector addition system with states 
(VASS), an extension of Petri nets that allows the use of control states. In 
particular, the aforementioned EXPSPACE and polynomial time bounds work 
for VASS. In 2018, a deeper study of the termination problem for VASS was 
initiated [12]. This study concerns the asymptotics of the function f(n) bounding 
the length of runs, where n bounds the size of the initial configuration. The focus 
is particularly on classes where f(n) is a polynomial function, suggesting that 
such classes are more relevant for practical applications. This line of work was 
later continued for variants of VASS involving probabilities [11] and games [31]. 

For VASS the function f(n) can asymptotically be as big as F;(n) in the 
Grzegorczyk hierarchy for any finite i (recall that F3(n) is nonelementary and 
F.,(n) is Ackermann) [35,43]. It was known that for terminating Petri nets many 
problems are considerably simpler [40]. However, to the best of our knowledge, 
the asymptotic behaviour of f(n) was not studied for Petri nets. 


Our Contributions. In this paper we focus on workflow nets, a class of Petri 
nets widely used to model business processes [1]. Our first result is the following 
dichotomy: any workflow net is either non-terminating or f(n) is linear. This 
confirms the intuition about fast termination of practical models [12]. In our 
proof, we follow the intuition of applying linear algebra from [40] and rely on 
recent results on workflow nets [9]. We further show that the optimal constant 
ay such that f(n) = ayw -n can be computed in polynomial time. The core of 
this computation relies on a reduction to continuous Petri nets [19], a well known 
relaxation of Petri nets. Then we can apply standard tools from the theory of 
continuous Petri nets, where many problems are in polynomial time [7,19]. 

For workflow nets, the central decision problems are related to soundness. 
There are many variants of this problem (see [2] for a survey). For example 
k-soundness intuitively verifies that k started processes eventually properly ter- 
minate. We are interested in generalised soundness, which verifies whether k- 
soundness holds for all k [25-27]. The exact complexity of most popular sound- 
ness problems was established only recently in 2022 [9]. Generalised sound- 
ness is surprisingly PSPACE-complete. Other variants, like k-soundness, are 
EXPSPACE-complete, thus computationally harder, despite having a seem- 
ingly less complex definition. Moreover, unlike k-soundness and other vari- 
ants, generalised soundness preserves desirable properties like composition [26]. 
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Building on our first result, i.e. the dichotomy between non-terminating and lin- 
early terminating workflow nets, our second result is that generalised soundness 
is coNP-complete for terminating workflow nets. 

Finally, we observe that the asymptotics of f(n) are defined with the implicit 
assumption that transitions are fired sequentially. Since workflow nets are models 
for parallel executions it is natural to expect that runs would also be performed 
in parallel. Our definition of parallel executions is inspired by similar concepts 
for time Petri nets, and can be seen as a particular case [5]. We propose a 
definition of the optimal running time of runs exploiting parallelism and denote 
this time g(n), where n bounds the size of the initial marking. We show that the 
asymptotic behaviour of g(n), similar to f(n), can be computed in polynomial 
time, for workflow nets with mild assumptions. Together, the two functions f(n) 
and g(n) can be seen as (pessimistic) upper bound and (optimistic) lower bound 
on the time needed for the workflow net to terminate. 


Experiments. Based on our insights, we implement several procedures for prob- 
lems related to termination in workflow nets. Namely, we implement our algo- 
rithms for checking termination, for deciding generalised soundness of work- 
flow nets and for computing the asymptotic behaviour of f(n). We addition- 
ally implement procedures to compute f(k),g(k) and decide k-soundness for 
terminating workflow nets. To demonstrate the efficacy of our procedures, we 
test our implementation on a popular and well-studied benchmark suite of 1382 
workflow nets, originally introduced in [18]. It turns out that the vast majority 
of instances (roughly 90%) is terminating, thus the class of terminating work- 
flow nets seems highly relevant in practice. Further, we positively evaluate our 
algorithm for generalised soundness against a recently proposed state-of-art app- 
roach [10] which semi-decides the property in general, and is further exact on the 
class of free-choice workflow nets [3]. Interestingly, our novel approach for gener- 
alised soundness is also a semi-procedure in general, but precise on terminating 
workflow nets. The approach from [10] is implemented as an 3V-formula from 
FO(Q, <,+), while our approach manages to avoid any quantifier alternations. 
It turns out that our approach is faster on over 95% of benchmark instances, 
and thus significantly improves upon the state-of-art. The mean analysis time 
for our approach is just 12.8 ms, while it is about 2s for the previous state-of- 
the-art. In addition, the classes of free-choice and terminating workflow nets are 
incomparable, thus our approach complements the state-of-the-art. 


Related Work. For general Petri nets and VASS the most well-known problem 
is reachability, recently shown to be Ackermann-complete [14, 33,34]. Despite its 
high complexity, there are tools for the problem [16,45], including some based on 
integer and continuous relaxations [6,8,21]. Reachability was also studied in the 
context of terminating models. In particular, it is PSPACE-complete for struc- 
turally terminating Petri nets [40] and EXPSPACE-complete for polynomially 
terminating VASS [32]. 
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Most algorithms for soundness are based on reductions to reachability [1], this 
is the case for the first algorithms for generalised soundness [25,27]. However, 
such reductions only imply Ackermannian upper bounds on the problem, while 
a direct study yielded elementary complexities [9]. 

A different class of approaches for soundness relies on reduction rules, which 
can be applied iteratively to reduce the size of a net while exactly preserving 
soundness [4,39]. These approaches are not precise in general, but can be for 
subclasses, e.g. for live and bounded free-choice workflow nets [15]. We use a 
certain set of reduction rules [13] for generalised soundness in our experimental 
evaluation. 

There exist many established tools and frameworks for the analysis of work- 
flow nets, for example Woflan [44], WoPeD [20], and ProM [17]. However, when 
it comes to soundness problems, these tools typically focus on k-soundness, with 
a particular focus on k = 1 (except for the discussed tool in [10]). 


Organisation. In Sect. 2 we define the models, problems and basic notation. In 
Sect.3 we prove the dichotomy between non-terminating and linear workflow 
nets. Then, we show how to compute the linear constants for terminating work- 
flow nets in Sect. 4. Building on the dichotomy we show that generalised soudness 
is coNP-complete in Sect. 5. In Sect. 6 we define and compute a variant of ter- 
mination time that takes into account parallelism. We present our experimental 
results in Sect. 7. Most proofs can be found in the appendix. 


2 Preliminaries 


We write N, No, Z, Q and Qs» for the naturals (including 0), the naturals with- 
out 0, the integers, the rationals, and the nonnegative rationals, respectively. 

Let N be a set of numbers, e.g. N = N. For d,d1,dz € Nso we write N° 
for the set of vectors with elements from N in dimension d. Similarly, N@*” 
is the set of matrices with dı rows and dy columns and elements from N. We 
use bold font for vectors and matrices. For a € Q and d € Nyo, we write 
a? := (a,a,...,a) E Q? (or a if d is clear from context). In particular 0% = 0 is 
the zero vector. 

Sometimes it is more convenient to have vectors with coordinates in a finite 
set. Thus, for a finite set S, we write N°, Z’, and QÏ for the set of vectors over 
naturals, integers and rationals. Given a vector v and an element s € S, we write 
v(s) for the value v assigns to s. 

Given v,w € Q5, we write v < w if v(s) < w(s) for all s € S, and v < w 
if v < w and v(s) < w(s) for some s € S. The size of S, denoted |S], is the 
number of elements in S. We define the norm of a vector ||v|| = maxses|v(s)|, 
and the norm of a matrix A € Q”*” as || A|| := maxi<j<mi<i<n|A(i, j)|. For a 
set S € Qt, we denote by S € R? the closure of S in the euclidean space. 
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2.1 (Integer) Linear Programs 


Let n,m € N>o, A € Z™*”, and b € Z™. We say that G := Ax < bis a system 
of linear inequalities with m inequalities and n variables. The norm of a system 
G is defined as ||G|| := || A||+||b||4-m+n. An (mxn)-ILP, short for integer linear 
program, is a system of linear inequalities with m inequalities and n variables, 
where we are interested in the integer solutions. An (m x n)-LP is such a system 
where we are interested in the rational solutions. We use the term MILP, short 
for mixed integer linear program, for a system where some variables are allowed 
to take on rational values, while others are restricted to integer values. 

We allow syntactic sugar in ILPs and LPs, such as allowing constraints x > y, 
x=y,x < y (in the case of ILPs). Sometimes we are interested in finding optimal 
solutions. This means we have a objective function, formally a linear function 
on the variables of the system, and look for a solution that either maximizes or 
minimizes the value of that function. For LPs, finding an optimal solution can 
be done in polynomial time, while this is NP-complete for ILPs and MILPs. 


2.2 Petri Nets 


A Petri net N is a triple (P, T, F), where P is a finite set of places; T is a finite set 
of transitions such that TN P = 0; and F: ((P x T)U(T x P)) > N is a function 
describing its arcs. A marking is a vector m € NP. We say that m(p) is the 
number of tokens in place p € P and p is marked if m(p) > 0. To write markings, 
we list only non-zero token amounts. For example, m = {p1: 2,p2: 1} is the 
marking m with m(p,) = 2, m(p2) = 1 and m(p) = 0 for all p € P \ {p1, po}. 

Let t € T. We define the vector °t € NP by *¢(p) = F(p,t) for p € P. 
Similarly, the vector t°? € NP is defined by t*(p) = F(t,p) for p € P. We write 
the effect of t as A(t) = t° — °t. A transition t is enabled in a marking m if 
m > °t. If t is enabled in the marking m, we can fire it, which leads to the 
marking m’ := m + A(t), which we denote m —! m’. We write m —> m’ if 
there exists some t € T such that m —>* m’. 

A sequence of transitions 7 = tıtə...tn is called a run. We denote the length 
of m as |r| := n. A run v is enabled in a marking m iff m >" mı >" 
M >" ...Mp—1 >t” m’ for some markings m1, M2,..., m’ € NP. The set of 
all runs is denoted Runsj;, i.e. 7 € Runsiy if m is enabled in m. The effect of 
T is A(T) = Vietnj A(t). Firing m from m leads to a marking m’, denoted 
m =>" m’, iff m € Runs; and m’ = m+ A(z). We denote by —* the reflexive, 
transitive closure of —. Given two runs T = tıt2...tn and T’ = titz... t we 
denote m7’ := tita... tntith. ty. 

The size of a Petri net is defined as |W] = |P|+|T|+ |F|. We define the norm 
of N as ||N|| := ||F || +1, where we view F as a vector in N@PXT)U(TX?P) 

We also consider several variants of the firing semantics of Petri nets which we 
will need throughout the paper. In the integer semantics, we consider markings 
over Z?, and transitions can be fired without being enabled. To denote the firing 
and reachability relations, we use the notations >z and —3. In the continuous 
semantics [19], we consider markings over Q£,. Given t € T and a scaling factor 
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Fig.1. A Petri net with places pi, p2,p3,pą and transitions t1,t2. Marking 
{pi: 2, p4: 1} is drawn. No transition is enabled. 


B € Qso’, the effect of firing 6t is A(Gt) = 8 - A(t). Further, bt is enabled in 
a marking m iff 8 -°t < m. We use +g. for the continuous semantics, that 


is, m =o m’ means ft is enabled in m and m’ = m + A(t). A continuous 
run t is a sequence of factors and transitions 8ıtıbətə2... Bntn. Enabledness 
and firing are extended to continuous runs: M >Q., m’ holds iff there exist 


my 5? -- “+My zi i ” m’. The length of 


M,,...,™M,y_1 such that m >Q 
T is l = 7 Gi. Given a ye “Qs and a run 7 = (jt) (ote... Bntn we write 
ar to denote the run (a )ti(@G2)te ...(@Bn)tn. We introduce a lemma stating 


that continuous runs can be rescaled. 


Lemma 1 (Lemma 12(1) in [19]). Let a € Qso. Then m >Q, M if and 
only if am =Z, am’. g 

Each run under normal semantics or integer semantics is equivalent to a 
continuous run i.e. titz.. .tn © ltıltz...1t2. Given m € Runs); (i.e. a standard 
run) we define anr = ane where 7, ~ 7 is a continuous run. If me = fit... Bntn 
with 6; € N for alli € {1,...,n} then we also call m a (standard) run, i.e. the 
run where every transition t; is repeated (; times. 

We define the set of continuous runs enabled from m € N? in M as CRuns}y. 
The Parikh image of a continuous run 7 = biti Goto... Bntn is the vector R, € 
QT such that R,(t) = Dijt;= bi For a (standard) run m we define its Parikh 


image R, := Rr, where me ~ m. Given a ko R € Qp, we define A(R) = 
Xeer R(t) A(t ), °R = Fee th R(t), R? = Yee - R(t). Note that R 
is essentially a run without imposing an order on the transitions. For ease of 
notation, we define A(T) as a matrix with columns indexed by T and rows 
indexed by P, where A(T)(t)(p) := A(t)(p). Then A(R) = A(T) R. 
Example 1. Consider the Petri net drawn in Fig. 1. Marking m := {p1: 2, p4: 1} 
enables no transitions. However, we have m +31? {p3: 2}. We also have m +7" 
{p3: 2}, since the transition order does not matter under the integer semantics. 
Thus, when we take R = {t;: 1,t2: 1}, we have m >È {p3: 2}. 

Under the continuous semantics we can fire 1/2t;, which is impossible under 


the normal semantics. For example, we have m —9 A i {p1: 1, p2: 1/2} —> aie 


{pit 1, p3: 1,p47 1} 4g o {p1: 1/3, p2: 1/3, p3: 1, p4: ap} 


1 Sometimes scaling factors are defined to be at most 1. The definitions are equivalent: 
Scaling larger than 1 can be done by firing the same transition multiple times. 


138 P. Hofman et al. 


2.3. Workflow Nets 
A workflow net is a Petri net M such that: 


— There exists an initial place i with F(t,i) = 0 for all t € T (i.e. no tokens can 
be added to i); 

— there exists a final place f with F(f,t) = 0 for all t € T (i.e. no tokens can 
be removed from f); and 

— in the graph (V, E) with V = PUT and (u,v) € E iff F(u,v) £0, each v € V 
lies on at least one path from i to f. 


We say that M is k-sound iff for all m, {i: k} —* m implies m —* {f: k}. 
Further, we say NV is generalised sound iff it is k-sound for all k. 

A place p € P is nonredundant if {i: k} —* m for some k € N and mark- 
ing m with m(p) > 0, and redundant otherwise. We accordingly say that N is 
nonredundant if all p € P are nonredundant, otherwise M is redundant. A redun- 
dant workflow net can be made nonredundant by removing each redundant place 
p € P and all transitions such that *t(p) > 0 or t°(p) > 0. Note that this does 
not impact behaviour of the workflow, as the discarded transitions could not be 
fired in the original net. A polynomial-time saturation procedure can identify 
redundant places, see [27, Thm. 8, Def. 10, Sect. 3.2] and [9, Prop. 5.2]. 

If M is a workflow net, we write Runs); for the set of runs that are enabled 
from the marking {i: k}, and CRuns{, for the same for continuous runs. Lemma 
1 implies that if 7 € Runs, then im € CRunsy. The converse does not need to 
hold as the rescaled continuous run need not have natural coefficients. 


Example 2. The Petri net in Fig.1 can be seen as a workflow net with initial 
place pı and final place p3. The workflow is not k-sound for any k. Further, the 
net is redundant: {i: k} is a deadlock for every k, so places p2,p3 and p4 are 
redundant. < 


2.4 Termination Complexity 


Let N be a workflow net. Let us define as MaxTimey(k) the supremum of 
lengths among runs enabled in {i: k}, that is, MarTimey(k) = sup{|r| | 7 € 
Runsh;}. We say that M is terminating if MazTimeyw (k) £ 00 for all k € No, 
otherwise it is non-terminating. 

We say that N has polynomial termination time if there exist d € N, £ € R 
such that for all k, 

MaxTimey(k) < l- kê. (1) 

Further M has linear termination time if Eq. (1) holds with d = 1. Even more 
fine-grained, M has a-linear termination time if Eq. (1) holds for £ = a and 
d = 1. Note that any net with a-linear termination time also has (a + b)-linear 
termination time for all b > 0. For ease of notation, we call workflow nets that 
have linear termination time linear workflow nets, and similarly for a-linear. 

We define aw := inf{a € R | M is a -linear}. Note that in particular M is 
ayn-linear (because the inequality in Eq. (1) is not strict) and that ay is the 
smallest constant with this property. 
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Fig. 2. Two workflow nets with the initial marking {i: 1}. The workflow net on the 
left-hand side is terminating in linear time. The workflow net on the right-hand side is 
the same as the one on the left, but with one extra transition t4. It is non-terminating. 


Example 3. The net on the left-hand side of Fig. 2 is terminating. For example, 
from {i: 2} all runs have length at most 3. It is easy to see that from {i: k} all 
runs have length at most 3k (e.g. the run (tytot3)t2!). The net has ay = 3/2. 
The net on the right-hand side is non-terminating. From {i: 2}, all runs of 
the form tıtət% are enabled. Note that while the net is non-terminating, all runs 
from {i: 1} have length at most 1 (because t3 and t4 are never enabled). < 


Our definition of termination time is particular to workflow nets, as there it 
is natural to have only i marked initially. It differs from the original definition of 
termination complexity in [12]. In [12] VASS are considered instead of Petri nets, 
and the initial marking is arbitrary. The termination complexity is measured in 
the size of the encoding of m. The core difference is that in [12] it is possible 
to have a fixed number of tokens in some places, but arbitrarily many tokens 
in other places. In Sect.3 we show an example that highlights the difference 
between the two notions. Our definition is a more natural fit for workflow nets, 
and will allow us to reason about soundness. Indeed, our particular definition of 
termination time allows us to obtain the coNP-completeness result of generalised 
soundness for linear workflow nets in Sect. 5. 


3 A Dichotomy of Termination Time in Workflow Nets 


Let us exhibit behaviour in Petri nets that cannot occur in workflow nets. 
Consider the net drawn in black in Fig.3 and a family of initial markings 
{{p1 : 1,s1 : 1,0: n} | n € N}. From the marking {pı : 1,sı : 1,b : n}, all 
runs have finite length, yet a run has length exponential in n. From the mark- 
ing {pı : k,sı : 1,b : n}, the sequence (tıt2)"t4(t3)?"ts results in the marking 
{pı : 2k,sı : 1,b : n — 1}. Thus, following this pattern n times leads from 
{pı : 1,s1 : 1,b : n} to {pı : 2”, sı : 1}. This behaviour crucially requires us to 
keep a single token in sı, while having n tokens in b. 

We can transform the net into a workflow net, as demonstrated by the colored 
part of Fig. 3. However, observe that then 


{i:2} 4ht4 {p : 2,81: 1, s2: 1,b:1}—*** {p : 2,51 : 1, s2 : 1,b: 1, p3 : 1}. 
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Note that the sequence tt2t3 strictly increased the marking. It can thus be fired 
arbitrarily many times, and the workflow net is non-terminating. 

It turns out that, contrary to standard Petri nets, there exist no workflow 
nets with exponential termination time.? Instead, there is a dichotomy between 
non-termination and linear termination time. 


Theorem 1. Every workflow net N is either non-terminating or linear. More- 
over, MaxTimey(k) < ak for some a < IV PoeulnD 


Fig. 3. In black: A Petri net M adapted from [28, Lemma 2.8]. It enables a run with 
length exponential in n from marking {pi : 1, sı : 1,6: n}. In color: Additional places 
and transitions, which make M a workflow net. 


As explained in Sect. 2.3 we can assume that N is nonredundant, i.e. for all 
p © P there exists k € N such that {i: k} —* m with m(p) > 0. The first 
important ingredient is the following lemma. 


Lemma 2. Let N = (P,T, F) be a nonredundant workflow net. Then N is non- 
terminating iff there exists a nonzero R € NT such that A(R) > 0. 


Proof (sketch). The first implication follows from the fact that if we start from 
a big initial marking, then it is possible to fill every place with arbitrarily many 
tokens. In such a configuration any short run is enabled, so if there is a run with 
non-negative effect then it is further possible to repeat it infinitely many times. 
For the other implication we reason as follows. If there is an infinite run then 
by Dickson’s lemma there are m,m’ € N? such that for some k, it holds that 
{i: k} >" m =? m’ and m’ > m. But then A(R,) = m’ -m > 0. 


We define ILP w with a |T| dimensional vector of variables æ as the following 
system of inequalities: x > 0 and A(T)æ > 0—{i: co}.? The next lemma follows 
immediately from the definition of >z. 


? This is caused by the choice of the family of initial configurations. Fixing the number 
of initial tokens in some places can be simulated by control states in the VASS model. 

3 This œ is syntactic sugar to omit the inequality for the place i. Formally A(T) and 
x should be projected to ignore i. 
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Lemma 3. [Adapted from Claim 5.7 in [9|] For every k € N, m EN”, and a 
run T, it holds that {i: k} =z m iff Rr is a solution to ILPy with the additional 


constraint X` A A(ti) (i) - Ry(ti) > —k. 


| 
i= 


Proof (Sketch for Theorem 1). Because of Lemma 3 the Parikh image of every 
run (in U,¢y Runs{;) is a solution R € NT of A(T)R > —{i: co}. So, we 
consider a set of solutions of the system of inequalities A(T)R > —{i: oo}. It 
is a linear set, so the sum of two solutions is again a solution and any solution 
can be written as a sum of small solutions with norm smaller than some c € N. 
For such small solutions, the length of any corresponding run is at most |T|- c. 
Now observe that if the workflow is terminating then there is no R € NT such 
that A(T)R > 0, because of Lemma 2. But it holds that A(R)(i) < —1 for any 
solution R, so in particular for all small solutions. Let us take a run 7 € Runs\;. 
We decompose R, as a finite sum bay R; where R; are from the set of small 
solutions. We have —k < A(R;)(i) = ND A(R; (i) < >» —1 = —£. Recall that 
the norm of small solutions is bounded by c. It follows that the length of the run 
m is bounded by ¢-|T|-c< k- |T|- c. So the workflow is |T] - c-linear. 


4 Refining Termination Time 


Recall that ayy is the smallest constant such that M is a,-linear. In this section, 
we are interested in computing ay. This number is interesting, as it can give 
insights into the shape and complexity of the net, i.e. a large ayy implies compli- 
cated runs firing transitions several times, while a small ayy implies some degree 
of choice, where not all transitions can be fired for each initial token. 

The main goal of this section is to show an algorithm for computing ay. Our 
algorithm handles the more general class of aggregates on workflow nets, and 
we can compute ay as such an aggregate. More formally, let M = (P,T, F) be 
a workflow net. An aggregate is a linear map f : QT — Q. The aggregate of a 
(continuous) run is the aggregate of its Parikh image, that is f(t) = f(R,). 


Example 4. Consider the aggregate fault) = X rer Rr(t) = |r|, which com- 
putes the number of occurrences of all transitions. Let us consider two other 
natural aggregates. The aggregate f(t) := R,(t) computes the number of occur- 
rences of transition t, and f,(7) = X per A(t)(p) - R(t) computes the number 
of tokens added to place p. Another use for aggregates is counting transition, but 
with different weights for each transition, thus simulating e.g. different costs. < 


Given a workflow net M and an aggregate f we define 


supy = sup { A | k € Nso, r € Runs Y (2) 


Let us justify the importance of this notion by relating it to ay. 


Proposition 1. Let N be a linear workflow net. Then ay = SUP fau, N: 
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Proof. Recall that ay is the smallest number a such that |r| < a- k for all k € 
Nso and 7 € Runsh;. Equivalently, [fou] < a. Thus by definition supp, wv < 
an, and the inequality cannot be strict since ay is the smallest number with 
this property. 


Theorem 2. Consider a workflow net N and an aggregate f. The value SUD FN’ 
can be computed in polynomial time. 


Corollary 1. Let N = (P,T, F) be a linear workflow net. The constant ay can 
be computed in polynomial time. 


In practice, we can use an LP solver to compute the constant ay. The algo- 
rithm is based on the fact that continuous reachability for Petri nets is in poly- 
nomial time [7,19]. We formulate a lemma that relates the values of aggregates 
under the continuous and standard semantics. 


Lemma 4. Let N be a Petri net and f be an aggregate. 


1. Letn € Runsh,. Then t/r- n € CRunsy and f(1/k-m) = f/r. 
2. Let ne € CRunsy. There are k € N and n € Runsk, with f(t) = f@)/k. 


Proof. Both items are simple consequences of Lemma 1 and the linearity of 
aggregates. Note that for (2), if me = 6,t,...Byt, then it suffices to define k 
such that 8; -k € N for all i € {1,...,n}. 


From the above lemma we immediately conclude the following. 


Corollary 2. It holds that sup; y = sup{ f (Te) | Te € CRunsy;}. 


Proof (The proof of Theorem 2). We use Corollary 2 and conclude that we have 
to compute sup{ f (re) | te € CRunsy;}. Let S = {Ry | me € CRunsy}. As f(z) 
is defined as f( R+) , we reformulate our problem to compute sup{ f(v) | v € S}. 
Since f is a continuous function, it holds that sup{ f(v) | v € S} = sup{ f(v) | 
v € S}. Let us define LPs as an LP with variables æ := 24,...,2)7; and 
constraints A(T)a > —{i: 1} and æ > 0. 


Claim 1. It holds that v € S if and only if v is a solution to LPs y. 


We postpone the proof of Claim 1. Claim 1 allows us to rephrase the computation 
of sup{ f(v) | v € S} as an LPs, where we want to maximise f(v), which can 
be done in polynomial time. 


What remains is the proof of Claim 1. It constitutes the remaining part of this 
Section. The claim is a special case of the forthcoming Lemma 8. Its formulation 
and proof require some preparation. 


Definition 1. A workflow net is good for a set of markings M C QE, if for 
every place p there are markings m,m’ and continuous runs t and n’ such that 


m(p) > 0, m’ € M, and {i: 1} >, M >To m. 
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The notion of being good for a set of markings is a refined concept of nonre- 
dundancy. The nonredundancy allow us to mark every place. But if, after mark- 
ing the place, we want to continue the run and reach a marking in a specific set 
of markings M C Q2o; then we don’t know if the given place can be marked. 
This motivates Definition 1. 


Example 5. Let us consider a workflow net depicted on Fig. 4. It is nonredun- 
dant, as every place can be marked. But it is not good for {f: 1} as there is no 
continuous run to the marking {f: 1}. In the initial marking the only enabled 
transition is tı but firing Gt, for any P E€ Qso reduce the total amount of tokens 
in the net. The lost tokens can not be recrated so it is not possible to reach 


{f: 1}. 


ty p2 to 


t3 


Fig. 4. A Petri net with places pı, p2,p3 and transitions t1,t2,t3. Marking {i: 1} is 
drawn. 


The important fact is as follows: 


Lemma 5. Let M C Of, be a set of solutions of some LP. Then testing if a 
net is good for M can be done in polynomial time. 


Lemma 6. Suppose a workflow net N is good for M C Q&, and M is a convex 
set. Then there is a marking m4 such that m4(p) > 0 for every p € P and 
there are continuous runs 7, 7’, and a marking mz E€ M such that {i: 1} >5,, 


m+ 04 mf. 
Informally, we prove it by taking a convex combination of a |P| runs one for each 


p € P. The last bit needed for the proof of Lemma 8 is the following lemma, 
shown in [19]. 


Lemma 7 ([19], Lemma 13). Let N be a Petri net. Consider mo, m € NP 
and v € Qo such that: 


- m = Mo + A(v); 
- Vp E °v: mo(p) > 0; 
~Vpev*: m(p) > 0. 


Then there exists a finite continuous run 7 such that mo Os, mand R, =v. 
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Lemma 8. Suppose M is a convex set of markings over Qs and that the work- 
flow net is good for M. Let S be the set of Parikh images of continuous runs 
that start in {i: 1} and end in some marking m’ € M i.e. 


S := {R, | Jre CRuns4 dm’eM such that {i: 1} Oso m’}. 


Then v € S if and only if there is a marking m € M such that A(T)v = 
m- {i: 1}. 


In one direction the proof of the lemma is trivial, in the opposite direction, 
intuitively, we construct a sequence of runs with Parikh images converging to 
v. The Lemma 6 is used to put £ in every place (for € — 0) and Lemma 7 to 
show that there are runs with the Parihk image equal ex + (1 — £)v for some x 
witnessing Lemma 6. We are ready to prove Claim 1. 


Claim 1. It holds that v € S if and only if v is a solution to LPry. 


Proof. Let M be the set of all markings over Qo» which clearly is convex. As 
N is nonredundant we know that every place can be marked via a continuous 
run, and because M is the set of all markings we conclude that M is good for 
M according to Definition 1. Thus M satisfies the prerequisites of Lemma 8. It 
follows that S is the set of solutions of a system of linear inequalities. Precisely, 
v € S if and only if there is m € QE, such that A(T)v > m— {i: 1} and v > 0, 
which is equivalent to A(T)v > —{i: 1} and v > 0, as required. 


5 Soundness in Terminating Workflow Nets 


The dichotomy between linear termination time and non-termination shown in 
Sect.3 yields an interesting avenue for framing questions in workflow nets. We 
know that testing generalised soundness is PSPACE-complete, but the lower 
bound in [9] relies on a reset gadget which makes the net non-terminating. 
Indeed, it turns out that the problem is simpler for linear workflow nets. 


Theorem 3. Generalised soundness is coNP-complete for linear workflow nets. 


A marking m is called a deadlock if Runsk; = Ø. To help prove the coNP 
upper bound, let us introduce a lemma. 


Lemma 9. Let N be a terminating nonredundant workflow net. Then N is 
not generalised sound iff there exist k € N and a marking m € N? such that 
{i: k} >ž m, m is a deadlock and m # {f: k}. Moreover, if \|N|| < 1 then 
{i: k} +z m can be replaced with {i: k} +g m. 


The last part of the lemma is not needed for the theoretical results, but it will 
speed up the implementation in Sect. 7. We can now show Theorem 3. 
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Proof (of the coNP upper bound in Theorem 3). Let N = (P,T, F) and denote 
T = {ti,...,tn}. By Lemma 9 N is not generalised sound iff there are k € N and 
m € NP such that {i: k} +3, m, m is a deadlock and m # {f: k}. We can reduce 
the property to an ILP. First, the procedure guesses |T| places pi,...,Pn E€ P 
(one for each transition). For each transition t;, place p; will prohibit firing ti 
PE pn, Which is very 
similar to ILPy (see Sect. 3), but adds additional constraints. They state that 
(A(T)a)(p;) — °t;(p;) < 0 for every 1< j <n. 

Let us show that there are p1,...,pn such that ILP ,p,,...,pn has a solution iff 
there exist k and a deadlock m such that {i: k} <% m. Indeed, let 71,...,2 
be a solution of ILPyw.p,,....»,- We denote k = — Y`; 1 A(ti)(i) - x; and m = 
{i: k} + SO, A(ti) - vi. It is clear that {i: k} —% m. The new constraints 
ensure that for each t; € T there exists p; € P such that *t;(p;) > m(p;), thus 
m is a deadlock. 

To encode the requirement that m Æ {f: k}, note that there are three cases, 
either m(k) < k— 1, m(k) > k+1, or m(k) = k but m — {f: k} > 0. We guess 
which case occurs, and add the constraint for that case to ILP Wp, 


grey, Pn* 


The lower bound can be proven using a construction presented in [10, Theo- 
rem 2] to show a problem called continuous soundness on acyclic workflow nets is 
coNP-hard. We say that a workflow net is continuously sound iff for all m such 
that {i: 1} >., m, it holds that m >ð., {f: 1}. The reduction can be used 
as is to show that generalised soundness of nets with linear termination time is 
coNP-hard, but the proof differs slightly. See the appendix for more details. 


6 Termination Time and Concurrent Semantics 


Note that in Petri nets, transitions may be fired concurrently. Thus, in a sense, 
our definition of termination time may overestimate the termination time. 

In this section we investigate parallel executions for workflow nets. Whereas 
the termination time is focused on the worst case sequential execution, now we 
are interested in finding the best case parallel executions. Thus, we provide an 
optimistic lower bound on the execution time to contrast the pessimistic upper 
bound investigated in Sect. 3 and Sect. 4. 


Definition 2. Given a Petri net N let 7 = titz ...tn € Runsh, for some k EN. 
A block in m is a subsequence of 7, i.e. ta,...,tp for somel1<a<b<n. We 
define the parallel execution of m with respect to k as a decomposition of m into 
blocks 7 = 117...7¢ such that 


1. all transitions are pairwise different in a single block; and 
2. ° Ra, < {i k} + jci Alay) for every l <i<l. 


The execution time of a parallel execution is denoted as exec(m172...7¢) = L. 
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Example 6. 
We consider parallel executions of the 
ty r run tıtztıtətz3t3 with respect to 4 ini- 
f tial tokens. The run can be decom- 
posed into (tt2)(tit2)(t3)(t3) but also 
t3 into (tı)(t2t1)(tət3)(t3). Both execu- 
tions have execution time 4. The paral- 
to 3 lel execution (tıt2)(tıt2t3)(t3) has exe- 
cution time 3. < 


We are interested in finding the parallel executions of a run that minimise 
the execution time. It turns out that the so-called greedy parallel execution is 
such a minimal parallel execution. Given m and k it is defined inductively on the 
prefix of 7. Suppose we already have some blocks 71 ...m;—1. To construct block 
Ti, we simply choose the maximal sequence of transitions immediately following 
the last block 7;_1 that satisfies the two conditions of Definition 2. In particular 
the last partition in Example 6 is the greedy parallel execution. 


Lemma 10. Consider a run t and k € N. The greedy parallel execution of 7 
has the smallest execution time among all parallel executions of m with respect 
to k. 


Consider a workflow net M with the initial marking {i: k}. Let Sp := {7 | 
{i: k} =>" {f: k}}. We define MinTimey(k) as the minimal execution time 
among parallel executions of runs in Sp. If Sp = Ý then MinTimey(k) = +00. 


Lemma 11. Let N be a workflow net and let k,x € N. Deciding whether 
MinTimey(k) < x is PSPACE-hard even if we fix k = 1. 


As computing MinTimey(k) is computationally hard, we modify the ques- 


tion and ask about the asymptotic behaviour (similarly to Sect. 4). Thus, we are 


MinPimen (k), The problem is well defined as 


MinTimey (k) 
k 


interested in computing limk—>oo 
the limit exists. This is interesting as limk— oo corresponds to the 
average processing time of a single token when the workflow runs (informally 
speaking) on its maximal efficiency. 


Theorem 4. For a given nonredundant, generalised sound workflow nett N we 


MinTimen (k) 
k 


can compute liMk—oo in polynomial time. 


Proof (A sketch of the proof). The main idea relies on the continuous semantics, 
similarly to the proof of Theorem 2. We show that the limit is equal to the 
infimum over execution times® of continuous runs {i: 1} >q, {f: 1}. Then we 
prove the following claim. 7 


4 These assumptions can be relaxed to a net good for {f: 1}, see Definition 1. 
5 For a suitably defined parallel execution and execution time of continuous runs. 
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Claim 2. Let v € QZ. Let Sy = {r | {i: 1} >G,, {f: 1} and Rr = v}. FS 40 
then the infimum over optimal execution time of runs in Sẹ equals ||v}]. 


Let S be the set of Parikh images of continuous runs from {i: 1} to {f: 1}. 
We define f : § — Qso such that f(v) = ||v||. Thus we can reformulate the 
problem as computing inf{f(v) |v € S}. The function f is continuous, thus 
we can reformulate further as compute inf{f(v) | v € S}. The function f is 
not linear on J, but it is piecewise linear. We define S; C S for t € T as follows 
Sı = {v | v € S and v(t) > v(t’) for all t € T}. Observe that f is linear over 
S: for every t € T and that S = (Uer St- Thus we can rephrase our problem as 
computing the minimum over the set {inf{v(t) | v € S:} | te T}. 

Thus it is sufficient to show that inf{v(t) | v € S;} can be computed in poly- 
nomial time for any t € T. Lemma 8 allows us to characterize S as follows: v € S 
iff A(T)v = {f: 1} — {i: 1} and v > 0. In consequence, S$; can be characterized 
as the set of solutions of the following system of inequalities 


A(T)v = {f: 1} — {i: 1} and v > 0 and v(t) > v(t’) for all t € T. 


This allows us to capture {inf {v(t) | v € Se} |t € T} as an LP problem which 
can be solved in polynomial time. 


7 Experimental Evaluation 


We have implemented prototypes of several procedures outlined in the paper, 
namely procedures to 1) decide termination; 2) decide soundness for terminat- 
ing nets; 3) compute ay for terminating nets; and 4) compute MinTimey (1), 
MaxTimey(1), and decide 1-soundness for nets with known ay. The idea 
behind all procedures is to use our results to encode the properties in LPs/ILPs. 
To solve these programs, we utilize the MILP solver Gurobi [24]. 

For 1), recall Lemma 2, which states that non-termination of a workflow net 
N is equivalent to the existence of a Parikh image R € NT with A(R) > 0. We 
can instead search for R € QT, as any solution could be scaled up to an integral 
one. Thus, we can encode this condition as an LP in a straightforward manner, 
and decide termination in polynomial time. 

E pn, as defined in the proof of Theorem 
ae pn Yields a run 7 such that there exists k € N with 
{i: k} —% m, where m is a deadlock. 

We also consider continuous instead of integral variables. Then solutions 
relate to runs over >Q instead. As hinted at in the last sentence of Lemma 9, 
both variants yield equivalent results on nets without arc weights, i.e. |N]| < 1. 
However, continuous variables are generally easier to handle for MILP solvers. 
For brevity, by integer deadlocks we refer to the approach using integer variables, 
and by continuous deadlocks to the approach with continuous variables. 


ê This observation and the general approach comes from [30]. 
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For 3), recall the LP given in Claim 1. We can use it to compute sup p jv 
for any aggregate M, so in particular we can use it to compute sup fau, yo Which 
is equal to ay by Equation (2). Here, it only remains to mention that Gurobi 
allows not only checking feasibility of systems of linear inequalities, but further 
allows optimizing an objective value, as required by the LP. 

For 4), note that if we have the bound ay on the length of runs from {i: 1}, we 
can check properties by unrolling runs. The intuition is as follows. We have ay - 
|T| integer variables. For step j of the run, we have variables 71 ,;,%2,;,--.,2)r|,j- 
The variables for a step encode which transition(s) are fired in that step. We 
ensure that we encode a run by requiring pa Lig <1 for all j € [1..ay/]. We 
use integer variables, so either one or no transition is fired in each step. 

Alternatively, we encode a parallel execution by imposing the requirements 
of Definition 2 on steps. By further specifying that for all j € [1..ay/], it holds 
that {i: 1} + Eiro al A(ti)xi j > 0, thus the marking reached so far after 
each step is nonnegative. To compute MinTimey(1)/MaxTimey (1), we min- 
imise/maximise the number of blocks/steps with non-zero transition variables. 
For 1-soundness, we require reaching a deadlock different from {f: 1}. 

Our prototype is implemented in C#. All experiments were run on an 8- 
Core Intel® Core™ i7-7700 CPU @ 3.60 GHz with Ubuntu 18.04. We limited 
memory to ~8 GB. The time was limited to 60 s for checking termination and 
generalised soundness as well as for computing ay. It was limited to 15 s for 
computing MinTimey (1), MarTimey(1) and for checking 1-soundness. 


7.1 Benchmark Suite 


We use a popular benchmark suite of 1386 free-choice nets originating from mod- 
els created in the IBM WebSphere Business Modeler. The instances were origi- 
nally introduced in [18] and have frequently been studied since, see [13,37,38]. 
The nets use a slightly different formalisation of workflow nets that allow mul- 
tiple final places, which can be transformed to standard workflow nets using 
a technique from [29]. This technique adds transitions, thus can increase ay, 
MinTimey and MazxTimey. Unfortunately, 4 instances cannot be transformed 
to workflow nets with this technique, so we remove them. We also apply a set 
of well-known reduction rules from [13] that reduce the size of instances while 
keeping all types of soundness intact, and remove instances that are trivially 
sound after reduction. These rules never increase ayy. While they in theory 
could increase MinTimey, this does not happen on our benchmarks. Due to 
the nature of the reduction rules, it may not be appropriate to run them before 
analyzing MinTimey, MaxTimey(1) and ay, since these numbers then give 
no information about the original workflow. Thus we only run experiments on 
the reduced instances when we check soundness and termination. 

In total, we are left with 1382 unreduced and 740 non-trivial reduced 
instances. Statistics about the sizes of the workflow nets can be seen in the 
columns under Net Size in Fig.5. The reduced nets are much smaller than the 
unreduced ones, even when the nets are not reduced to the trivial net. 
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Net Size Analysis Time (in ms) 
.__,.__|Continuous| Integer | Continuous 
|P| | IT| [Termination Deadlock |Deadlock|Soundness [10] 
Unreduced Mean /|48.78]33.07 4.09 TAr 12.8 2022.54 
S Median] 37 | 26 3 5 11 88 
Max | 274 | 285 23 85 88 55707 
Reduced Mean | 7.43 | 5.49 2.99 2.3 8.88 44.51 
snahenvebe Median| 6 5 3 2 8 33 
mstances | Max | 33 | 22 5 18 39 99 
Deadlocking 
Total (Not generalised sound) 
Unreduced|Terminating| 1262 523 
instances | Nonterm. | 120 53 
Reduced |Terminating| 694 536 
instances | Nonterm. | 46 23 


Fig. 5. Top: Statistics on the net size, and analysis times for deciding termination, 
and checking generalised soundness via deadlocks and continuous soundness. Bot- 
tom: Statistics on the number of terminating/non-terminating and deadlocking/non- 
deadlocking (thus generalised unsound/generalised sound) nets. 


7.2 Termination and Deadlocks 


The time taken to decide termination is shown in the column labelled “Termi- 
nation” in the top table of Fig. 5. The numbers of nets that are terminating and 
non-terminating are shown in the bottom table of Fig. 5. Among both the unre- 
duced and reduced instances, the vast majority are terminating (about 90%). 
Note that the reduction rules can remove nontermination, even when they do 
not make the net nontrivial, thus the prevalence of terminating instances is even 
stronger among the reduced instances. In terms of analysis time, termination 
can be decided in under 25 ms for all instances, with a median of 3 ms. 

The top of Fig. 5 shows the analysis times for generalised soundness. We use 
three algorithms. Columns “Continuous Deadlock” and “Integer Deadlock” show 
results for our two proposed approaches, and column “Continuous Soundness” 
shows the performance of a state-of-art approach [10] for deciding generalised 
soundness. Note that both approaches may claim an unsound workflow net to 
be sound, but they are precise on different classes of nets. The absence of integer 
deadlocks is equivalent to generalised soundness on terminating nets, see Lemma 
9. Similarly, continuous soundness is equivalent to generalised soundness on free- 
choice nets [10]. 

In practice, it turns out that our approach for checking the absence of inte- 
ger deadlocks is faster than the existing approach using continuous soundness on 
every single instance. Continuous soundness times out on 215 of the unreduced 
instances (not listed in the table), but neither of the approaches utilizing dead- 
locks times out on any instance. The performance of continuous soundness is 
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not surprising: continuous soundness is checked by passing an 4V-formula from 
FO(Q, <,+) to an SMT solver. Quantifier alternation increases the complexity 
of validating such formulas [23]. In comparison, our check for integer deadlocks 
is implemented using standard ILP techniques, and thus an existential formula. 

The bottom shows how many nets are non-terminating, as well as how many 
are deadlocking (thus not generalised sound). Recall that integer deadlocks and 
continuous deadlocks are equivalent for nets without arc weights, which all of 
our nets are. Both types of deadlocks are fast to compute, taking less than 90ms 
on each instance. In practice, checking for continuous deadlocks may be useful 
even for nets with arc weights, since their absence also proves the absence of 
integer deadlocks. About 50% of the unreduced instances and roughly 75% of 
the reduced instances are deadlocking. Note that the reduction rules can only 
make sound instances trivial, which are by definition not able to reach a deadlock. 


7.3 an, MinTimey(1) and MarTimey(1) 


The top of Fig.6 the distribution of ayy. This number depends on the number 
of transitions, so is hard to put into context. We instead display £ := aw/|T]. 
Intuitively, that number is an upper bound on the average of how many times 
each transition can be fired per initial tokens. For example, a net with £ = 1 
likely is linear, i.e. each transition can be fired only once per initial token, while 
nets with £ >> 1 may exhibit more complex behaviour, and nets with £ << 1 
may exhibit high degrees of choice, where runs only visit a part of the net. We 
group nets with similar £ to give an idea of the distribution of the values of £ 
across instances. Our computation of ay ran out of memory on 8 nets, so the 
figure displays only 1254 nets. Most nets have £ < 1, with a significant number 
having in particular £ = 1. The maximal £ is 5.83 among unreduced and 4.33 
among reduced instances, while the minimal £ is 0.17 and 0.11 respectively. 

To display MinTimey (1) and MaxTimey(1), we also divide them by the 
number of transitions, as we did for ay. We write Tmin = MinTimen(1)/|7| 
and Tmar = MarTimen(1)/\7|. We are mostly interested in their difference D := 
TMar — LmMin- For nets with large D, the difference between the pessimistic 
sequential and optimistic parallel execution time is large, thus they might allow 
a high degree of parallelism. On the contrary, if nets have very small D, they 
have a sequential structure. We again group nets with similar D, as we did for 
£ above. The results of the analysis are shown in the middle table of Fig. 6. 

As we divide by |T| in the definition of D, it would be unusual for it to 
take on huge values, and indeed all nets have D < 1. Note that even D = 0.5 is 
significant, as it means that MinTimey (1) and MaxTimey;(1) differ by half the 
number of transitions. The table totals only 700 nets. On 111 nets, computing 
MinTimey (1) times out, while on 32 nets computing MaxTime,y (1) times out, 
and both time out on 51 nets. On the remaining 360 nets, there is no execution 
from {i: 1} to {f: 1}, thus MinTimey(1) = oo. 

The analysis times for computing aw, MinTimey(1) and MaxTimey;(1) are 
shown in the bottom table of Fig. 6. We group nets by their size |N] = |P|+|T| to 
show how the analysis times depend on the instance size. We only list 1060 nets, 


Fast Termination and Workflow Nets 151 


Buckets B 
(0, 0.75) |[0.75, 1)}{1, 1]}(1, 1.75) | [1.75, oo) 
Count with £ € B| 303 274 | 422) 173 82 
Buckets B 
[0, 0.05) |[0.05, 0.15) |[0.15, 0.3) |[0.3, 0.5) /[0.5, 1) 
Count with D E€ B| 29 222 295 120 34 
Buckets B 
[0, 20) |[20, 60)][60, 150) [[150, 405) 
Count with |V| € B 241 | 391 388 40 
Analysis time Mean | 11.9 | 9.56 9.65 9.8 
for computing Median} 7 T 8 8 
ay (in ms) Max | 714 | 246 289 33 
Analysis time Mean | 8.29 | 120.52 | 1610.44 | 2128.83 


for computing Median} 8 36 307 1454 
MinTimey(1) (in ms)| Max | 14 | 6599 | 14905 | 12160 
Analysis time Mean | 3.99 | 44.23 | 669.66 | 5305.5 
for computing Median} 4 29 173 4934 
MaxTimey (1) (in ms)| Max 8 2561 | 12370 | 14954 


Fig. 6. Top: Statistics on the distribution of £. Middle: Statistics on the distribution 
of D. Bottom: Statistics on the analysis times for aw, Jmin and Jmaz. 


as we omit those where the computation of MinTimey(1) or MaxTimey (1) 
timed out. One interesting observation is that for most instances, particularly 
small ones, MinTimey;(1) is harder to compute than MaxTimey (1). However, 
both are very slow to compute compared to ay, which indeed never times out 
on our instances. In fact, ayy takes at most 714ms to compute for any instance. 
It is interesting that the time for computing ay does not seem to depend highly 
on the net size. We suspect this might be partly due to the fact that ay tends 
to be proportionally smaller for larger instances: Bucket [0,20) has a mean £ of 
1.04, while the mean is 0.86 for bucket [150, 405). 


7.4 1-Soundness 


Lastly, we briefly comment on the time for deciding 1-soundness via unrolling 
for nets with known ayy. The procedure times out for 71 instances, among which 
ay has a mean of 133.88 and a maximum of 256. It takes a mean of 612.66ms 
and a maximum of 14431ms to decide 1-soundness in this way. Unlike in the 
case for generalised soundness, our procedure for 1-soundness does not seem to 
be able to compete with the state-of-the-art. In [18], 1-soundness is decided for 
many of our instances in a few milliseconds per instance, which our approach 
does so only for instances with small ay (up to about 25). 


152 


P. Hofman et al. 


References 


Li, 


10. 


11. 


12. 


13. 


14. 


Aalst, W.M.P.: Verification of workflow nets. In: Azéma, P., Balbo, G. (eds.) 
ICATPN 1997. LNCS, vol. 1248, pp. 407-426. Springer, Heidelberg (1997). https:// 
doi.org/10.1007/3-540-63139-9_48 

van der Aalst, W.M.P., et al.: Soundness of workflow nets: classification, decid- 
ability, and analysis. Formal Aspects Comput. 23(3), 333-363 (2011). https://doi. 
org/10.1007/s00165-010-0161-4 

van der Aalst, W.M.: A class of petri net for modeling and analyzing business 
processes. Comput. Sci. Rep. 95(26), 1-25 (1995) 

van der Aalst, W.M.P., Hirnschall, A., Verbeek, H.M.W.: An alternative way to 
analyze workflow graphs. In: Pidduck, A.B., Ozsu, M.T., Mylopoulos, J., Woo, C.C. 
(eds.) CAiSE 2002. LNCS, vol. 2348, pp. 535-552. Springer, Heidelberg (2002). 
https: //doi.org/10.1007/3-540-47961-9_37 

Bérard, B., Cassez, F., Haddad, S., Lime, D., Roux, O.H.: Comparison of different 
semantics for time petri nets. In: Peled, D.A., Tsay, Y.-K. (eds.) ATVA 2005. 
LNCS, vol. 3707, pp. 293-307. Springer, Heidelberg (2005). https://doi.org/10. 
1007/11562948 23 

Blondin, M., Finkel, A., Haase, C., Haddad, S.: Approaching the coverability prob- 
lem continuously. In: Chechik, M., Raskin, J.-F. (eds.) TACAS 2016. LNCS, vol. 
9636, pp. 480-496. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3- 
662-49674-9_28 

Blondin, M., Finkel, A., Haase, C., Haddad, S.: The logical view on continuous Petri 
nets. ACM Trans. Comput. Logic (TOCL) 18(3), 24:1-24:28 (2017). https://doi. 
org/10.1145/3105908 

Blondin, M., Haase, C., Offtermatt, P.: Directed reachability for infinite-state 
systems. In: TACAS 2021. LNCS, vol. 12652, pp. 3-23. Springer, Cham (2021). 
https: //doi.org/10.1007/978-3-030-72013-1_1 

Blondin, M., Mazowiecki, F., Offtermatt, P.: The complexity of soundness in work- 
flow nets. In: Proceedings of the 37th Symposium on Logic in Computer Science 
(LICS) (2022). https: //doi.org/10.1145/3531130.3533341 

Blondin, M., Mazowiecki, F., Offtermatt, P.: Verifying generalised and structural 
soundness of workflow nets via relaxations. In: Shoham, S., Vizel, Y. (eds.) CAV. 
LNCS, vol. 13372, pp. 468-489. Springer, Cham (2022). https://doi.org/10.1007/ 
978-3-031-13188-2_23 

Brázdil, T., Chatterjee, K., Kučera, A., Novotny, P., Velan, D.: Deciding fast ter- 
mination for probabilistic VASS with nondeterminism. In: Chen, Y.-F., Cheng, 
C.-H., Esparza, J. (eds.) ATVA 2019. LNCS, vol. 11781, pp. 462-478. Springer, 
Cham (2019). https: //doi.org/10.1007/978-3-030-31784-3_27 

Brazdil, T., Chatterjee, K., Kucera, A., Novotny, P., Velan, D., Zuleger, F.: Effi- 
cient algorithms for asymptotic bounds on termination time in VASS. In: Dawar, 
A., Gradel, E. (eds.) Proceedings of the 33rd Annual ACM/IEEE Symposium on 
Logic in Computer Science, LICS 2018, Oxford, UK, 09-12 July 2018, pp. 185-194. 
ACM (2018). https://doi.org/10.1145/3209108.3209191 

Bride, H., Kouchnarenko, O., Peureux, F.: Reduction of workflow nets for gener- 
alised soundness verification. In: Bouajjani, A., Monniaux, D. (eds.) VMCAI 2017. 
LNCS, vol. 10145, pp. 91-111. Springer, Cham (2017). https://doi.org/10.1007/ 
978-3-319-52234-0_6 

Czerwinski, W., Orlikowski, L.: Reachability in vector addition systems is 
Ackermann-complete. In: Proceedings 62nd Annual IEEE Symposium on Foun- 
dations of Computer Science (FOCS) (2021) 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


29. 


30. 


Fast Termination and Workflow Nets 153 


Desel, J., Esparza, J.: Free Choice Petri Nets. Cambridge University Press (1995). 
https: //doi.org/10.1017/CBO9780511526558 

Dixon, A., Lazić, R.: KReach: a tool for reachability in Petri nets. In: TACAS 
2020. LNCS, vol. 12078, pp. 405-412. Springer, Cham (2020). https://doi.org/10. 
1007 /978-3-030-45190-5_22 

van Dongen, B.F., de Medeiros, A.K.A., Verbeek, H.M.W., Weijters, A.J.M.M., 
van der Aalst, W.M.P.: The ProM framework: a new era in process mining tool 
support. In: Ciardo, G., Darondeau, P. (eds.) ICATPN 2005. LNCS, vol. 3536, pp. 
444-454. Springer, Heidelberg (2005). https: //doi.org/10.1007/11494744_25 
Fahland, D., et al.: Instantaneous soundness checking of industrial business process 
models. In: Dayal, U., Eder, J., Koehler, J., Reijers, H.A. (eds.) BPM 2009. LNCS, 
vol. 5701, pp. 278-293. Springer, Heidelberg (2009). https: //doi.org/10.1007/978- 
3-642-03848-8_19 

Fraca, E., Haddad, S.: Complexity analysis of continuous Petri nets. Fundamenta 
Informaticae 137(1), 1-28 (2015). https://doi.org/10.3233/FI-2015-1168 
Freytag, T., Allgaier, P., Burattin, A., Danek-Bulius, A.: WoPeD-a “proof-of- 
concept” platform for experimental BPM research projects. In: 15th International 
Conference on Business Process Management (BPM 2017) (2017) 

Geffroy, T., Leroux, J., Sutre, G.: Occam’s razor applied to the Petri net cover- 
ability problem. Theor. Comput. Sci. 750, 38-52 (2018). https://doi.org/10.1016/ 
j-tcs.2018.04.014 

German, S.M., Sistla, A.P.: Reasoning about systems with many processes. J. ACM 
39(3), 675-735 (1992). https: //doi.org/10.1145/146637.146681 

Grigoriev, D.: Complexity of deciding Tarski algebra. J. Symb. Comput. 5(1/2), 
65-108 (1988). https: //doi.org/10.1016/S0747-7171(88)80006-3 

Gurobi Optimization, LLC: Gurobi optimizer reference manual (2023). https:// 
www.gurobi.com 

van Hee, K., Oanea, O., Sidorova, N., Voorhoeve, M.: Verifying generalized sound- 
ness of workflow nets. In: Virbitskaite, I., Voronkov, A. (eds.) PSI 2006. LNCS, 
vol. 4378, pp. 235-247. Springer, Heidelberg (2007). https: //doi.org/10.1007/978- 
3-540-70881-0_21 

van Hee, K., Sidorova, N., Voorhoeve, M.: Soundness and separability of workflow 
nets in the stepwise refinement approach. In: van der Aalst, W.M.P., Best, E. 
(eds.) ICATPN 2003. LNCS, vol. 2679, pp. 337-356. Springer, Heidelberg (2003). 
https: //doi.org/10.1007/3-540-44919-1_22 

van Hee, K., Sidorova, N., Voorhoeve, M.: Generalised soundness of workflow nets is 
decidable. In: Cortadella, J., Reisig, W. (eds.) ICATPN 2004. LNCS, vol. 3099, pp. 
197-215. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-27793- 
412 

Hopcroft, J., Pansiot, J.J.: On the reachability problem for 5-dimensional vector 
addition systems. Theoret. Comput. Sci. 8(2), 135-159 (1979) 

Kiepuszewski, B., ter Hofstede, A.H.M., van der Aalst, W.M.P.: Fundamentals of 
control flow in workflows. Acta Informatica 39(3), 143-209 (2003). https://doi. 
org/10.1007/s00236-002-0105-4 

Kosaraju, 5.R., Sullivan, G.F.: Detecting cycles in dynamic graphs in polynomial 
time (preliminary version). In: Simon, J. (ed.) Proceedings of the 20th annual 
ACM symposium on theory of computing, 2-4 May 1988, Chicago, Illinois, USA, 
pp. 398-406. ACM (1988). https://doi.org/10.1145/62212.62251 


154 


31. 


32. 


33. 


34. 


35. 


36. 


37. 


38. 


39. 


40. 


41. 


42. 


43. 


44. 


45. 


P. Hofman et al. 


Kucera, A., Leroux, J., Velan, D.: Efficient analysis of VASS termination com- 
plexity. In: Hermanns, H., Zhang, L., Kobayashi, N., Miller, D. (eds.) LICS 2020: 
35th Annual ACM/IEEE Symposium on Logic in Computer Science, Saarbrücken, 
Germany, 8-11 July 2020, pp. 676-688. ACM (2020). https://doi.org/10.1145/ 
3373718.3394751 

Leroux, J.: Polynomial vector addition systems with states. In: Chatzigiannakis, 
I., Kaklamanis, C., Marx, D., Sannella, D. (eds.) 45th International Colloquium on 
Automata, Languages, and Programming, ICALP 2018, 9-13 July 2018, Prague, 
Czech Republic. LIPIcs, vol. 107, pp. 134:1-134:13. Schloss Dagstuhl - Leibniz- 
Zentrum für Informatik (2018). https://doi.org/10.4230/LIPIcs.ICALP.2018.134 
Leroux, J.: The reachability problem for Petri nets is not primitive recursive. In: 
Proceedings 62nd Annual IEEE Symposium on Foundations of Computer Science 
(FOCS) (2021) 

Leroux, J., Schmitz, S.: Reachability in vector addition systems is primitive- 
recursive in fixed dimension. In: Proceedings 34th Symposium on Logic in Com- 
puter Science (LICS) (2019). https://doi.org/10.1109/LICS.2019.8785796 
Leroux, J., Schnoebelen, P.: On functions weakly computable by petri nets and 
vector addition systems. In: Ouaknine, J., Potapov, I., Worrell, J. (eds.) RP 2014. 
LNCS, vol. 8762, pp. 190-202. Springer, Cham (2014). https://doi.org/10.1007/ 
978-3-319-11439-2_15 

Lipton, R.: The reachability problem requires exponential space. Department of 
Computer Science, Yale University, vol. 62 (1976) 

Meyer, P.J., Esparza, J., Offtermatt, P.: Computing the expected execution time of 
probabilistic workflow nets. In: Vojnar, T., Zhang, L. (eds.) TACAS 2019. LNCS, 
vol. 11428, pp. 154-171. Springer, Cham (2019). https://doi.org/10.1007/978-3- 
030-17465-1_9 

Meyer, P.J., Esparza, J., Volzer, H.: Computing the concurrency threshold of sound 
free-choice workflow nets. In: Beyer, D., Huisman, M. (eds.) TACAS 2018. LNCS, 
vol. 10806, pp. 3-19. Springer, Cham (2018). https://doi.org/10.1007/978-3-319- 
89963-3_1 

Murata, T.: Petri nets: properties, analysis and applications. Proc. IEEE 77(4), 
541-580 (1989). https: //doi.org/10.1109/5.24143 

Praveen, M., Lodaya, K.: Analyzing reachability for some petri nets with fast grow- 
ing markings. Electron. Notes Theor. Comput. Sci. 223, 215-237 (2008). https:// 
doi.org/10.1016/j.entcs.2008.12.041 

Rackoff, C.: The covering and boundedness problems for vector addition sys- 
tems. Theor. Comput. Sci. 6, 223-231 (1978). https://doi.org/10.1016/0304- 
3975(78)90036-1 

Schmitz, S.: The complexity of reachability in vector addition systems. ACM 
SIGLOG News 3(1), 4-21 (2016). https://doi.org/10.1145/2893582.2893585 
Valk, R., Vidal-Naquet, G.: Petri nets and regular languages. J. Comput. Syst. Sci. 
23(3), 299-325 (1981). https: //doi.org/10.1016 /0022-0000(81)90067-2 

Verbeek, E., van der Aalst, W.M.P.: Woflan 2.0 a petri-net-based workflow diag- 
nosis tool. In: Nielsen, M., Simpson, D. (eds.) ICATPN 2000. LNCS, vol. 1825, pp. 
475-484. Springer, Heidelberg (2000). https: //doi-org/10.1007/3-540-44988-4_28 
Wolf, K.: Petri net model checking with LoLA 2. In: Khomenko, V., Roux, O.H. 
(eds.) PETRI NETS 2018. LNCS, vol. 10877, pp. 351-362. Springer, Cham (2018). 
https: //doi.org/10.1007/978-3-319-91268-4 18 


Fast Termination and Workflow Nets 155 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 
or format, as long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons license and indicate if changes were 
made. 

The images or other third party material in this chapter are included in the 
chapter’s Creative Commons license, unless indicated otherwise in a credit line to the 
material. If material is not included in the chapter’s Creative Commons license and 
your intended use is not permitted by statutory regulation or exceeds the permitted 
use, you will need to obtain permission directly from the copyright holder. 


®) 


Check for 
updates 


Lincheck: A Practical Framework 
for Testing Concurrent Data Structures 
on JVM 


Nikita Koval!(), Alexander Fedorov!:?, Maria Sokolova!, Dmitry Tsitelov?, 
and Dan Alistarh? 


CAV ; CAV 
Artifact JetBrains, Prague, Czech Republic Artifact 
Evaluation ndkoval@ya.ru Evaluation 

al 2 IST Austria, Klosterneuburg, Austria tal ial ad 
3 Devexperts, Munich, Germany 


Abstract. This paper presents Lincheck, a new practical and user- 
friendly framework for testing concurrent algorithms on the Java Vir- 
tual Machine (JVM). Lincheck provides a simple and declarative way 
to write concurrent tests: instead of describing how to perform the test, 
users specify what to test by declaring all the operations to examine; 
the framework automatically handles the rest. As a result, tests written 
with Lincheck are concise and easy to understand. The framework auto- 
matically generates a set of concurrent scenarios, examines them using 
stress-testing or bounded model checking, and verifies that the results 
of each invocation are correct. Notably, if an error is detected via model 
checking, Lincheck provides an easy-to-follow trace to reproduce it, sig- 
nificantly simplifying the bug investigation. 

To the best of our knowledge, Lincheck is the first production-ready 
tool on the JVM that offers such a simple way of writing concurrent 
tests, without requiring special skills or expertise. We successfully inte- 
grated Lincheck in the development process of several large projects, 
such as Kotlin Coroutines, and identified new bugs in popular concur- 
rency libraries, such as a race in Java’s standard ConcurrentLinkedDeque 
and a liveliness bug in Java’s AbstractQueuedSynchronizer framework, 
which is used in most of the synchronization primitives. We believe that 
Lincheck can significantly improve the quality and productivity of con- 
current algorithms research and development and become the state-of- 
the-art tool for checking their correctness. 


1 Introduction 


Concurrent programming is known to be notoriously hard and error-prone. Writ- 
ing a good and robust test for a concurrent data structure may be even more 
challenging than implementing it. Programmers produce many such stress tests 
every day, but they often are nondeterministic, cover only specific cases, and do 
not catch all the bugs. Both the industry and academia need a tool that would 
simplify writing reliable tests for concurrent data structures. 
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In this paper, we present Lincheck [1], a new practical framework for JVM- 
based languages (such as Java, Kotlin, and Scala), which simplifies writing reli- 
able concurrent tests. While most existing tools require writing the algorithm in 
a special language [2], specifying all possible concurrent scenarios and their out- 
comes [3-6], or learning a large amount of theory [7,8], Lincheck provides a more 
pragmatic declarative approach. It requires users only to list the data structure 
operations, thus, specifying what to test instead of how. Taking these operations, 
Lincheck generates a set of concurrent scenarios and examines them via stress 
testing or model checking, verifying that the outcome results are correct. The 
default correctness property is linearizability [9], but various relaxations [10-12] 
are also supported. One may think of Lincheck as a mix of a fuzzer (that gener- 
ates concurrent scenarios) and a model checker or stress runner (which examines 
these scenarios) equipped with an automatic outcome verifier. 


Lincheck by Example. The “classic” way to write a concurrent test is to man- 
ually run parallel threads, invoking the data structure operations in them and 
checking that some sequential history can explain the produced results. Such 
tests typically contain hundreds of lines of boilerplate code and cover only easy- 
to-verify scenarios. Lincheck automates the machinery, making tests short and 
declarative. To illustrate that, we present a test for the ConcurrentLinkedDeque 
collection (double-ended queue, which supports insertions and removals at both 
ends) of the standard Java library in Listing 1. 

The initial state of the testing data structure is specified in the constructor; 
here, we simply create a new empty deque at line 2. The following lines 4- 
9 declare the deque operations; they should be annotated with @Operation. 
Finally, we run the analysis by invoking ModelCheckingOptions.check(. .) 
on the testing class at line 11. Replacing ModelCheckingOptions with 
StressOptions switches to stress testing, which essentially runs parallel threads. 


i class DequeTest { 
2 val deque = ConcurrentLinkedDeque <Int>() 


1 @Operation fun addFirst(e: Int) = deque.addFirst(e) 
5 @Operation fun addLast(e: Int) deque . addLast (e) 


6 @Operation fun pollFirst () = deque.pollFirst () 
7 @Operation fun pollLast () = deque.pollLast () 
8 @Operation fun peekFirst () = deque.peekFirst () 


@Operation fun peekLast () deque.peekLast () 


11 @Test fun runTest() = ModelCheckingOptions () 

12 .check(this::class) 

13 } 

Listing 1. Concurrent test via Lincheck for Java’s ConcurrentLinkedDeque. The code 
is written in Kotlin; import statements are omitted. 


After executing the test, we get an error presented in Fig. 1. Surprisingly, 
this class from the standard Java library has a bug; the error was originally 
detected via Lincheck by the authors [13] (notably, there were several unsuc- 
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= Invalid execution results = 
addLast (-6) | addFirst (-8) | 
peekFirst(): -8 pollLast(): -8 


= The following interleaving leads to the error = 

| addFirst (-8) 

| pollLast () 

pollLast(): -8 at DequeTest.pollLast (DequeTest.kt:35) 
last(): Node@1 at CLD.pollLast (CLD. java:936) 
item.READ: null at CLD.pollLast (CLD. java:938) 

| prev.READ: Node@2 at CLD.pollLast (CLD. java:946) | 

| item.READ: -8 at CLD.pollLast (CLD. java:938) 

| next.READ: null at CLD.pollLast (CLD. java:940) 

switch 

addLast (-6) 

peekFirst(): -8 

| item.CAS(-8,null): true at CLD.pollLast (CLD. java:941) 

| unlink(Node@2) at CLD.pollLast (CLD. java:942) 

result: -8 


addFirst(-8) addLast(-6) 


unlink(#2) (#2 # #2.item.CAS(-8, null) 
by polltast() by polltast() 
<n ] 


Fig. 1. The incorrect execution of the Java’s ConcurrentLinkedDeque identified by the 
Lincheck test from Listing 1 and illustrated by a pictured diagram. To narrow the test 
output, ConcurrentLinkedDeque is replaced with CLD. 


cessful attempts to fix the incorrectness before that [14,15]). Obviously, the 
produced results are non-linearizable: for pollLast() in the second thread to 
return -8, it should be called before addLast (-6) in the first thread; however, 
that would require the following peekFirst() to return -6 instead of -8. While 
Lincheck always prints a failing scenario with incorrect results (if found), the 
model checker also provides a detailed interleaving trace that reproduces the 
error. 

Providing a detailed and informative trace is a game-changer. With it, we 
can easily understand why ConcurrentLinkedDeque is incorrect. The under- 
lying data structure forms a doubly-linked list, with head and tail pointers 
approximating its first and last nodes. Initially, head and tail point to a log- 
ically removed (Node.item == null) node. After addFirst(-8) in the sec- 
ond thread is applied, a new node is added to the beginning; head and tail 
remain unchanged. Then, pollLast() starts; it finds the last non-empty node 
(the previously added one) and gets preempted before extracting the element. 
(The procedure linearizes on changing the Node. item value to null via atomic 
Compare-and-Set (CAS) instruction.) After invoking addLast(-6) in the first 
thread, a new node is added to the end of the list. The following peekFirst () 
does not change the data structure logically but advances the head pointer. 
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Finally, the execution switches back to the second thread. The pollLast () oper- 
ation successfully removes the node containing -8 (which is no longer the last 
element), extracting the item via CAS followed by unlinking the node physically. 
These twelve lines of straightforward code easily find a bug in the standard 
library of Java and provide a detailed trace that leads to the error, reducing 
the investigation time from hours to minutes. We also believe that with such an 
instrument as Lincheck, the bug would not have been released in the first place. 


Practical-Oriented Design. Lincheck was designed as a tool for testing real- 
world concurrent code. The following its properties are crucial in practice: 


— Declarative testing. Lincheck takes only a list of operations and optional 
configuration parameters (we discuss them further), which results in short 
and intuitive tests — no need to learn a new language or technology. 

— No implementation restrictions. Lincheck can test any real-world imple- 
mentations, including those that utilize low-level JVM constructs like Unsafe 
or VarHandle, without imposing any restrictions. 

— No false positives. Lincheck reports only reproducible errors, which is vital 
for using the framework in continuous integration (CI/CD) and unit tests. 

— User-friendliness. Lincheck streamlines bug investigation by providing a 
thorough trace of the discovered error, saving programmers countless hours. 

— Flexibility. Lincheck supports popular constraints, such as the single- 
producer /consumer workload, as well as a range of linearizability relaxations, 
enabling custom scenario generation and verification when necessary. 


Real-World Applications. We have successfully integrated Lincheck in the 
development processes of Kotlin Coroutines [16] and JCTools [17] libraries, 
enabling reliable testing of their core data structures, which are often complex 
and several thousand lines of code long. Lincheck’s support of popular work- 
load constraints and linearizability relaxations and its ability to handle blocking 
operations, such as those of Mutex and Channel, were crucial for these tests. 
Furthermore, for over five years, we have successfully used Lincheck in our 
“Parallel Programming” course to automate the verification of more than 4K 
student solutions annually. 

We have also detected several new bugs [18] in popular libraries, includ- 
ing the previously discussed race in Java’s ConcurrentLinkedDeque [13], non- 
linearizabi-lity of NonBlockingHashMapLong from JCTools [19], and liveness 
bugs in Java’s AbstractQueuedSynchronizer [18] and Mutex in Kotlin Corou- 
tines [20]. 

In conclusion, Lincheck is a powerful and versatile tool for testing complex 
concurrent programs. It provides non-trivial features in terms of generality, ease 
of use, and performance. We provide a comprehensive overview of Lincheck in 
the rest of the paper and believe that it will greatly save time and (mental) 
energy tracking down concurrency bugs. 
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2 Lincheck Overview 


We now dive into Lincheck internals, presenting its key features as we go along. 
The testing process can be broken down into three stages, as depicted in the 
diagram below. Lincheck generates a set of concurrent scenarios and examines 
them via either model checking or stress testing, verifying that each scenario 
invocation results satisfy the desirable correctness property (linearizability [9] 
by default). If the outcome is incorrect, the invocation hangs, or the code throws 
an unexpected exception, the test fails with an error similar to the one in Fig. 1. 


Minimizing Failing Scenarios. When an error is detected, it is often possible 
to reproduce it with fewer threads and operations [21]. Lincheck automatically 
“minimizes” the failing scenario in a greedy way: it repeatedly removes an opera- 
tion from the scenario until the test stops failing, thus finding a minimal failing 
scenario. While this approach is not theoretically-optimal, we found it working 
well in practice!. 


User Guide. This section focuses mainly on the technical aspects behind the 
Lincheck features. For those readers who are interested in using the framework 
in their project, we suggest taking a look at the official Lincheck guide [22]. 


2.1 Phase 1: Scenario Generation 


Lincheck allows to tune the number of parallel threads, operations in them, and 
the number of scenarios to be generated when creating ModelCheckingOptions 
or StressOptions. The framework then generates a set of concurrent scenarios 
by filling threads with randomly picked operations (annotated with @Operation) 
and generating (by default random) arguments for these operations. 


Operation Arguments. Consider testing a concurrent hash table. If it has a 
bug, it is more likely to be detected when accessing the same element concur- 
rently. To increase the probability of such scenarios, users can narrow the range 
of possible elements passed to the operations; Listing 2 illustrates how to con- 
figure the test in a way so the generated elements are always between 1 and 3. 


1 @Param(name = "elem", gen = IntGen::class, conf = "1:3") 
2 @OpGroupConfig(name="Wwriter", nonParallel=true) 

3 class SingleWriterHashSetTest { 

1 val s = SingleWriterHashSet <Int>() 


6 @Operation(group = "writer" 
) // never executes concurrently 
7 fun add(@Param(name = "elem") e: Int) = s.add(e) 
8 @Operation 
9 fun contains(@Param(name = "elem") 
e: Int) = s.contains(e) 


1 Finding the minimum failing scenario is a highly complex problem, as it could be 
not based on any of the generated scenarios. 
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10 @Operation(group = "writer" 
) // never executes concurrently 
11 fun remove(@Param(name = "“elem") e: Int) = s.remove(e) 
12 
13 @Test fun runTest() = ModelCheckingOptions () 
14 .check(this::class) 
15 } 
Listing 2. Testing single-writer set with custom argument generation (highlighted 
with yellow) and single-writer workload constraint (highlighted with red). 


Workload Constraints. Some data structures may require a part of opera- 
tions not to be executed concurrently, such as single-producer/consumer queues. 
Lincheck provides out-of-the-box support for such constraints, generating sce- 
narios accordingly. The framework API requires grouping such operations and 
restricting their parallelism; Listing 2 illustrates how to test a single-writer set. 


2.2 Phase 2: Scenario Running 


Lincheck uses stress testing and model checking to examine generated scenarios. 
The stress testing mode was influenced by JCStress [3], but Lincheck automat- 
ically generates scenarios and verifies outcomes, while JCStress requires listing 
both scenarios and correct results manually. The main issue with stress testing is 
the complexity of analysing a bug after detecting it. To mitigate this, Lincheck 
supports bounded model checking, providing detailed traces that reproduce bugs, 
similar to the one in Fig.1. The rest of the subsection focuses on the model- 
checking approach, discussing the most significant details. 


Bounded Model Checker. The model-checking mode has drawn inspiration 
from the CHESS (also known as Line-Up) framework for C# [5]. It assumes 
the sequentially consistent memory model and evaluates all possible schedules 
with a limited number of context switches. Unlike CHESS, Lincheck bounds 
the number of schedules rather than context switches, which makes testing time 
independent of scenario size and algorithm complexity. 

In some cases, the specified number of schedules may not be enough to explore 
all interleavings, so Lincheck studies them evenly, probing logically different sce- 
narios first. For instance, imagine a case where Lincheck is analyzing interleavings 
with a single context switch and has previously explored only one interleaving, 
which originated from the first thread containing four atomic operations. Under 
these circumstances, Lincheck presumes that 25% of the interleavings have been 
explored when starting from the first thread, while the second thread remains 
unexplored. As a result, Lincheck becomes more inclined to select the second 
thread as the starting point for the next exploration. 


Switch Points. To control the execution, Lincheck inserts internal method 
calls at shared memory accesses by on-the-fly byte-code transformation via ASM 
framework [23]. These internal methods serve as switch points, enabling manual 
context switching. Notably, Lincheck supports shared memory access through 
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AtomicFieldUpdater, VarHand1le, and Unsafe and handles built-in synchroniza- 
tion via MONITORENTER/MONITOREXIT, park/unpark, and wait/notify. Inter- 
nally, it replaces there synchronization primitives with custom implementations, 
thus, enabling full control of the execution. 


Progress Guarantees. While exploring potential switch points, Lincheck can 
detect active synchronization, handling it similarly to locks. This capability 
to detect blocking code enables Lincheck to verify the testing algorithm for 
obstruction-freedom”, the weakest non-blocking guarantee [10]. Although more 
popular lock- and wait-freedom are part of Lincheck’s future plans, the majority 
of practical liveness bugs are caused by unexpected blocking code, making the 
obstruction-freedom check fairly useful for lock-free and wait-free algorithms. 


Optimizations. Lincheck uses various heuristics to speed up the analysis and 
increase the coverage. The most impactful one excludes final field accesses from 
the analysis, as their values are unchanging. Our internal experiments indicate a 
reduction in the number of inserted switch points by over x2 in real-world code. 
Another important optimization tracks objects that are not shared with other 
threads, excluding accesses to them from the analysis. This heuristic eliminates 
an additional 10-15% of switch points in practice. 


Happens-Before. When an operation starts, Lincheck collects which opera- 
tions from other threads are already completed to establish the “happens-before” 
relation; this information is further passed to the results verifier. 


Modular Testing. When constructing new algorithms, it is common to use 
existing non-trivial data structures as building blocks. Considering such under- 
lying data structures to be correct and treating their operations as atomic may 
significantly reduce the number of possible interleavings and check only mean- 
ingful ones, thus increasing the testing quality. Lincheck makes it possible with 
the modular testing feature; please read the official guide for details [22]. 


Limitations. For the model checking mode, the testing data structure must be 
deterministic to ensure reproducible executions, which is a common requirement 
for bug reproducing tools [24]. For the algorithms that utilize randomization, 
Lincheck offers out-of-the-box support by fixing seeds for Random; thus, making 
the latter deterministic. To our experience, Random is the only source of non- 
determinism in practical concurrent algorithms. 


Model Checking vs Stress Testing. The primary benefit of using model 
checking is obtaining a comprehensive trace reproducing the detected error, 
as demonstrated in Fig. 1. However, the current implementation assumes the 
sequentially consistent memory model, which can result in missed bugs caused 
by low-level effects, such as an omitted volatile modifier in Java. We are in 
the process of incorporating the GenMC algorithm [6,25] to support weak mem- 
ory models and increase analysis coverage through the partial order reduction 


2 The obstruction-freedom property ensures that any operation completes within a 
limited number of steps if all other threads are stopped. 
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technique. In the meantime, we suggest using stress testing in addition to model 
checking. 


2.3 Phase 3: Verification of Outcome Results 


Once the scenario is executed, the operation results should be verified against 
the specified correctness property, which is linearizability [9] by default. In brief, 
Lincheck tries to match the operation results to a sequential history that pre- 
serves the order of operations in threads and does not violate the “happens- 
before” relation established during the execution. 


LTS. Instead of generating all possible sequential executions, Lincheck lazily 
builds a labeled transition system (LTS) [26] and tries to explain the obtained 
results using it. Roughly, LTS is a directed graph, which nodes represent the 
data structure states, while edges specify the transitions and are labeled with 
operations and their results. Execution results are considered valid if there exists 
a finite path in the LTS (i.e., sequential history) that leads to the same results. 
Lincheck lazily builds LTS by invoking operations on the testing data struc- 
ture in one thread. Thus, the sequential behavior is specified implicitly. Figure 2 
illustrates an LTS lazily constructed by Lincheck for verifying incorrect results 
of ConcurrentLinkedDeque from Fig. 1. 


Sequential Specification. By default, 
Lincheck sequentially manipulates 
the testing data structure to build 
an LTS. It is possible to specify the 
sequential behavior explicitly, provid- ‘temo 
ing a separate class with the same 3 ET a 

methods as those annotated with % Y ~ 
@Operation. It allows for a sin- Pype K 

gle Lincheck test instead of sepa- > ()[-8, -61 ) 
rate sequential and concurrent ones. Cy 


For API details, please refer to the peekFirst(): -8 
guide [22]. 


peekFi D” -8 
peekFirst(): -6 


Fig.2. An LTS constructed for verify- 
ing ConcurrentLinkedDeque results from 
Fig. 1. 


Validation Functions. It is possible 
to validate the data structure invari- 
ants at the end of the test, adding the 
corresponding function and annotating it with @Validate. For example, we have 
uncovered a memory leak in the algorithm for removing nodes from a concurrent 
linked list in [27] by validating that logically removed nodes are unreachable at 
the end. 


Linearizability Relaxations. Additionally to linearizability, Lincheck sup- 
ports various relaxations, such as quiescent consistency [10], quantitative relax- 
ation [11], and quasi-linearizability [12]. 


Blocking Operations. Some structures are blocking by design, such as the case 
of Mutex or Channel. Consider a rendezvous channel, also known as “synchronous 
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queue”, as an example: senders and receivers perform a rendezvous handshake 
as a part of their protocol (senders wait for receivers and vice versa). If we run 
send(e) and receive() in parallel, they both succeed. However, executing the 
operations sequentially will result in suspending the first one. To reason about 
correctness, the dual data structures formalism [28] is usually used. Essentially, it 
splits each operation into two parts at the point of suspension, linearizing these 
parts separately. We extend this formalism by allowing suspended requests to 
cancel and by making it more efficient for verification. 


3 Evaluation 


Lincheck has already gained adoption in Kotlin and Java communities, as well 
as by companies and universities. It has been integrated into the development 
processes of Kotlin Coroutines [16] and JCTools [17], enabling reliable testing of 
their core data structures, and was used to find several new bugs in popular con- 
currency libraries and algorithms published at top-tier conferences. Furthermore, 
for over five years, we have successfully used Lincheck in our “Parallel Program- 
ming” course to automate the verification of more than 4K student solutions per 
year. Notably, many users appear to especially appreciate Lincheck’s low entry 
threshold and its ability to “explain” errors with detailed traces. 


Novel Bugs Discovered with Lincheck. We have uncovered multiple 
new concurrency bugs in popular libraries and authors’ implementations of 
algorithms published at top conferences. These bugs are listed in Table 1 
and include some found in the standard Java library. Lincheck not only 
detects non-linearizability and unexpected exception bugs, but also liveliness 
issues. For example, it identified an obstruction-freedom violation in Java’s 
AbstractQueuedSynchronizer framework, which is a foundation for building 
most synchronization primitives in the standard Java library. 

Notably, the tests that uncover the bugs listed in Table tabl are publicly 
available [18], allowing readers to easily reproduce these bugs. 


Running Time Analysis. We have designed Lincheck for daily use and expect 
it to be fast enough in interactive mode. Various factors, including the complexity 
of the testing algorithm and the number of threads, operations, and invocations, 
can impact its performance. We suggest using two configurations for the best user 
experience and robustness: a fast configuration for local builds to catch simple 
bugs quickly and a long configuration to perform a more thorough analysis on 
CI/CD (Continuous Integration) servers: 


— Fast: 30 scenarios of 2 threads x 3 operations, 1000 invocations per each; 
— Long: 100 scenarios of 3 threads x 4 operations, 10000 invocations per each. 


We assess the performance and reliability of Lincheck with these fast and long 
configurations by measuring the testing times and showing whether the expected 
bugs were detected. We run the experiment on the buggy algorithms listed in 
Table 1, along with ConcurrentHashMap and ConcurrentLinkedQueue from 
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Table 1. Novel bugs discovered with Lincheck; tests are publicly available [18]. 


Source Data structure Description 

Java ConcurrentLinkedDeque Non-linearizable [13]; see Fig. 1 
Java AbstractQueuedSynchronizer | Liveliness error 

Kotlin Coroutines [16] Mutex Liveliness error [20] 

JCTools [17] NonBlockingHashMapLong Non-linearizable [19] 
Concurrent-Trees [29] ConcurrentRadixTree Non-linearizable [30] 
PPoPP’10 [31] SnapTree Unexpected internal exception 
PPoPP’14 [32] Logical0rderingAVL® Deadlock 

ISPDC’15 [33] CATree Deadlock 

Euro-Par’17 [34] ConcurrencyOptimalTree Unexpected internal exception 


“ The deadlock in the LogicalOrderingAVL algorithm was originally found by Trevor 
Brown and later confirmed with Lincheck. 


Table 2. Running times of Lincheck tests with fast and long configurations using 
both stress testing and model checking (MC) for the listed data structures. Failed 
tests, which detect bugs, are highlighted with red. Notably, finding a bug may take 
longer than testing a correct implementation due to scenario minimization. 


Data Structure Fast Configuration | Long Configuration 
Stress | MC Stress MC 
ConcurrentHashMap (Java) 0.3s | 2.78 38.1s 1m 44s 
ConcurrentLinkedQueue (Java) 04s |1.7s 1m 26s 1m 41s 
LockFreeTaskQueue (Kotlin Coroutines) | 1.1s |1.4s 39.6s | 54.88 
Semaphore (Kotlin Coroutines) 2.1s |3.6s 22.3s 1m 44s 
ConcurrentLinkedDeque (Java) 04s /1.2s 19.7s |10.7s 
AbstractQueueSynchronizer (Java) 16s |0.5s 18.2s | 8.6s 
Mutex(Kotlin Coroutines) 0.9s |2.6s 23.6s |8.7s 
NonBlockingHashMapLong (JCTools) 0.6s |1.3s 4.45 7s 
ConcurrentRadixTree ([29]) 2.9s |10.6s 40.9s |2m30s 
SnapTree [31] 1.7s |5.8s 38.4s 5m6s 
LogicalOrderingAVL [32] 15s |4.2s 17.1s |36.9s 
CATree [33] 20.1s|0.8s 41.3s |6.5s 
ConcurrencyOptimalTree [34] 0.4s |1.5s 3s 7.38 


the Java standard library and a quasi-linearizable LockFreeTaskQueue with 
Semaphore from Kotlin Coroutines. The results are available in Table 2. The 
experiment was conducted on a Xiaomi Mi Notebook Pro 2019 with Intel(R) 
Core(TM) i7-8550U CPU @ 1.80GHz and 32Gb RAM. The results show that 
the fast configuration ensures short running times, being suitable for use as unit 
tests without slowing down the build and able to uncover some bugs. However, 
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some bugs are detected only with the long configuration, emphasizing the need 
for more operations and invocations to guarantee correctness. Despite this, the 
running time remains practical and acceptable. 


4 Related Work 


Several excellent tools for linearizability testing and model checking have been 
proposed, e.g. [4,5,35-41], and some even support relaxed memory models [6, 25, 
42,43] and linearizability relaxations [36,44]. Due to space limitations, we focus 
our discussion on the works that shaped Lincheck. 


Inspiration. Lincheck was originally inspired by the JCStress [3] tool for JVM, 
which is designed to test the memory model implementation. However, JCStress 
does not offer a declarative approach to writing tests. The bounded model 
checker in Lincheck was influenced by CHESS (Line-Up) [5] for C#, which is 
also non-declarative and does not support linearizability extensions. Lincheck 
offers several novel features and usability advantages compared to these inspira- 
tions, making it a versatile platform for research in testing and model checking. 
Although other tools such as GenMC [6, 25,43] have superior features, Lincheck 
is designed to be extensible and can integrate new tools. In particular, we are 
working on incorporating the GenMC algorithm into Lincheck at the moment 
of writing this paper. 


Lincheck Compared to Other Solutions. To the best of our knowledge, 
no other tool offers similar functionality. In particular, Lincheck allows certain 
operations to never execute in parallel (supporting single-producer /consumer 
constraints), detects obstruction-freedom violations (which is crucial for checking 
non-blocking algorithms), provides a way to specify sequential behavior explic- 
itly (enabling oracle-based testing), and supports blocking operations for Kotlin 
Coroutines. Furthermore, Lincheck is a highly user-friendly framework, featur- 
ing a simple API and easy-to-understand output, which we have found users to 
highly appreciate. 


5 Discussion 


We introduced Lincheck, a versatile and expandable framework for testing con- 
current data structures. As Lincheck is not just a tool but a platform for incor- 
porating advancements in concurrency testing and model checking, we plan to 
integrate cutting-edge model checkers that support weak memory models. Writ- 
ten in Kotlin, Lincheck is also interoperable with native languages such as Swift 
or C/C++. Our goal is to extend Lincheck testing to these languages, making 
it the leading tool for checking correctness of concurrent algorithms. We believe 
that Lincheck has the potential to significantly improve the quality and effi- 
ciency of concurrent algorithms development, reducing time and effort to write 
reliable tests and investigate bugs. 


Lincheck 167 


References 


i 


2: 


2 


10. 


12. 


13. 


14. 


15. 


16. 
I7: 


18. 


19. 


20. 


21. 


Lincheck - A framework for testing concurrent data structures on JVM. https:// 
github.com/Kotlin/kotlinx-lincheck 

Yu, Y., Manolios, P., Lamport, L.: Model checking TLA* specifications. In: Pierre, 
L., Kropf, T. (eds.) CHARME 1999. LNCS, vol. 1703, pp. 54-66. Springer, Heidel- 
berg (1999). https://doi.org/10.1007/3-540-48153-2_6 

The Java Concurrency Stress tests. https: //openjdk.java.net/ projects /code-tools/ 
jcstress 

Lindstrom, G., Mehlitz, P.C., Visser, W.: Model checking real time java using java 
pathfinder. In: Peled, D.A., Tsay, Y.-K. (eds.) ATVA 2005. LNCS, vol. 3707, pp. 
444-456. Springer, Heidelberg (2005). https://doi.org/10.1007/11562948 33 
Musuvathi, M., Qadeer, S.: Iterative context bounding for systematic testing of 
multithreaded programs. SIGPLAN Not. 42(6), 446-455 (2007) 

Kokologiannakis, M., Vafeiadis, V.: GENMC: a model checker for weak memory 
models. In: Silva, A., Leino, K.R.M. (eds.) CAV 2021. LNCS, vol. 12759, pp. 427- 
440. Springer, Cham (2021). https: //doi-org/10.1007/978-3-030-81685-8 20 
Jung, R., et al.: Iris: monoids and invariants as an orthogonal basis for concurrent 
reasoning. In: Conference Record of the Annual ACM Symposium on Principles of 
Programming Languages 2015, pp. 637-650 (2015) 

Coq - a formal proof management system. https: //coq.inria.fr 

Herlihy, M.P., Wing, J.M.: Linearizability: a correctness condition for concurrent 
objects. ACM Trans. Program. Lang. Syst. (TOPLAS) 12(3), 463-492 (1990) 
Herlihy, M., Shavit, N., Luchangco, V., Spear, M.: The art of multiprocessor pro- 
gramming. Newnes (2020) 


. Henzinger, T.A., Kirsch, C.M., Payer, H., Sezgin, A., Sokolova, A.: Quantitative 


relaxation of concurrent data structures. In ACM SIGPLAN Notices, vol. 48, pp. 
317-328. ACM (2013) 

Afek, Y., Korland, G., Yanovsky, E.: Quasi-linearizability: relaxed consistency for 
improved concurrency. In: Lu, C., Masuzawa, T., Mosbah, M. (eds.) OPODIS 2010. 
LNCS, vol. 6490, pp. 395-410. Springer, Heidelberg (2010). https: //doi.org/10. 
1007 /978-3-642-17653-1_ 29 

[J DK-8256833] ConcurrentLinkedDeque is non-linearizable. https: //bugs.openjdk. 
java.net /browse/JDK-8256833 

[J DK-8188900] ConcurrentLinkedDeque linearizability. https: //bugs.openjdk.java. 
net /browse/JDK-8188900 

[J DK-8189387] ConcurrentLinkedDeque linearizability continued. https://bugs. 
openjdk.java.net /browse/JDK-8189387 

Kotlin Coroutines. https: //github.com/Kotlin/kotlin-coroutines 

JCTools - Java Concurrency Tools for the JVM. https://github.com/JCTools/ 
JCTools 

Lincheck: A Practical Framework for Testing Concurrent Data Structures on JVM. 
Zenodo (2023) 

Race in NonBlockingHashMapLong in JCTools. https://github.com/JCTools/ 
JCTools/issues/319 

MutexLincheckTest.modelCheckingTest detects non lock-free execution path in 
Mutex #2590. https: //github.com/Kotlin/kotlinx.coroutines/issues/2590 

Lu, S., Park, S., Seo, E., Zhou, Y.: Learning from mistakes: a comprehensive study 
on real world concurrency bug characteristics. In: Proceedings of the 13th Inter- 
national Conference on Architectural Support for Programming Languages and 
Operating Systems, pp. 329-339 (2008) 


168 


22. 
23. 
24. 


25. 


26. 


27. 


28. 


29. 


30. 


31. 


32. 


33. 


34. 


35. 


36. 


37. 


38. 


39. 


40. 


N. Koval et al. 


Lincheck User Guide. https://kotlinlang.org/docs/lincheck-guide.html 
ObjectWeb ASM. https://asm.ow2.io 

Elmas, T., Burnim, J., Necula, G., Sen, K.: Concurrit: a domain specific language 
for reproducing concurrency bugs. ACM SIGPLAN Not. 48, 06 (2013) 
Kokologiannakis, M., Raad, A., Vafeiadis, V.: Model checking for weakly consistent 
libraries. In: Proceedings of the 40th ACM SIGPLAN Conference on Programming 
Language Design and Implementation, PLDI 2019, pp. 96-110. Association for 
Computing Machinery, New York (2019) 

Tretmans, J.: Conformance testing with labelled transition systems: implementa- 
tion relations and test generation. Comput. Netw. ISDN Syst. 29(1), 49-79 (1996) 
Koval, N., Alistarh, D., Elizarov, R.: Scalable fifo channels for programming via 
communicating sequential processes. In European Conference on Parallel Process- 
ing. Springer, Heidelberg (2019). http://pub.ist.ac.at/dalistar/Scalable FIFO _ 
Channels EuroPar.pdf 

Scherer III, W.N., Scott, M.L.: Nonblocking concurrent objects with condition syn- 
chronization. In: Proceedings of the 18th International Symposium on Distributed 
Computing, pp. 2121-2128 (2004) 

Concurrent Radix and Suffix Trees for Java. https://github.com/npgall/ 
concurrent-trees 

Race in ConcurrentSuffixTree in Concurrent-Trees library. https://github.com/ 
npgall/concurrent-trees/issues/33 

Bronson, N.G., Casper, J., Chafi, H., Olukotun, K.: A practical concurrent binary 
search tree. SIGPLAN Not. 45(5), 257-268 (2010) 

Drachsler, D., Vechev, M., Yahav, E.: Practical concurrent binary search trees via 
logical ordering. In: Proceedings of the 19th ACM SIGPLAN Symposium on Princi- 
ples and Practice of Parallel Programming, PPoPP 2014, pp. 343-356. Association 
for Computing Machinery, New York (2014) 

Sagonas, K., Winblad, K.: Contention adapting search trees. In: 2015 14th Inter- 
national Symposium on Parallel and Distributed Computing, pp. 215-224 (2015) 
Aksenov, V., Gramoli, V., Kuznetsov, P., Malova, A., Ravi, S.: A concurrency- 
optimal binary search tree, pp. 580-593 (2017) 

Pradel, M., Gross, T.R.: Fully automatic and precise detection of thread safety 
violations. In: Proceedings of the 33rd ACM SIGPLAN conference on Programming 
Language Design and Implementation, pp. 521-530 (2012) 

Burckhardt, S., Dern, C., Musuvathi, M., Tan, R.: Line-up: a complete and auto- 
matic linearizability checker. In: Proceedings of the 31st ACM SIGPLAN Confer- 
ence on Programming Language Design and Implementation, pp. 330-340 (2010) 
Li, G., Lu, S., Musuvathi, M., Nath, S., Padhye, R.: Efficient scalable thread- 
safety-violation detection: finding thousands of concurrency bugs during testing. 
In: Proceedings of the 27th ACM Symposium on Operating Systems Principles, 
pp. 162-180 (2019) 

O’Callahan, R., Choi, J.D.: Hybrid dynamic data race detection. In: Proceedings 
of the Ninth ACM SIGPLAN Symposium on Principles and Practice of Parallel 
Programming, PPoPP 2003, pp. 167-178. Association for Computing Machinery, 
New York (2003) 

Naik, M., Aiken, A., Whaley, J.: Effective static race detection for java. In: Proceed- 
ings of the 27th ACM SIGPLAN Conference on Programming Language Design 
and Implementation, PLDI 2006, pp. 308-319. Association for Computing Machin- 
ery, New York (2006) 

Sen, K.: Race directed random testing of concurrent programs. In PLDI 2008 (2008) 


41. 


42. 


43. 


44. 


Lincheck 169 


Huang, J., O’Neil Meredith, P., Rosu, G.: Maximal sound predictive race detection 
with control flow abstraction. SIGPLAN Not. 49(6), 337-348 (2014) 

Alglave, J., Kroening, D., Tautschnig, M.: Partial orders for efficient bounded 
model checking of concurrent software. In: Proceedings of the 25th International 
Conference on Computer Aided Verification, vol. 8044, pp. 141-157 (2013) 
Kokologiannakis, M., Lahav, O., Sagonas, K., Vafeiadis, V.: Effective stateless 
model checking for c/c++ concurrency. Proc. ACM Program. Lang. 2(POPL) 
(2017) 

Emmi, M., Enea, C.: Violat: generating tests of observational refinement for con- 
current objects. In: Dillig, I., Tasiran, S. (eds.) CAV 2019. LNCS, vol. 11562, pp. 
534-546. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-25543-5_30 


Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 
or format, as long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons license and indicate if changes were 
made. 


The images or other third party material in this chapter are included in the 


chapter’s Creative Commons license, unless indicated otherwise in a credit line to the 
material. If material is not included in the chapter’s Creative Commons license and 
your intended use is not permitted by statutory regulation or exceeds the permitted 
use, you will need to obtain permission directly from the copyright holder. 


Check for 
updates 


nekton: A Linearizability Proof Checker 


Roland Meyer'®, Anton Opaterny'™@, Thomas Wies*@®, and Sebastian Wolff?® 


CAV ' TU Braunschweig, Braunschweig, Germany CAV 
Artifact {roland.meyer, anton. opaterny}@tu-bs.de Artifact 
es oe 2 New York University, New York, USA peril 


{wies, sebastian.wolff}@cs.nyu.edu 
Available 


Abstract. nekton is a new tool for checking linearizability proofs of highly 
complex concurrent search structures. The tool’s unique features are its paramet- 
ric heap abstraction based on separation logic and the flow framework, and its 
support for hindsight arguments about future-dependent linearization points. We 
describe the tool, present a case study, and discuss implementation details. 


Keywords: separation logic - proof checker - linearizability - flow framework 


1 Introduction 


We present nekton, a mostly automated deductive program verifier based on separa- 
tion logic (SL) [23,27]. The tool is designed to aid the construction of linearizabil- 
ity proofs for complex concurrent search structures. Similar to many other SL-based 
tools [2,8, 14,22,33,33], nekton uses an SMT solver to automate basic SL reasoning. 
Similar to the original implementation of CIVL [7], it uses non-interference reason- 
ing a la Owicki-Gries [25] to automate thread modularity. What makes nekton stand 
out among these relatives is its inbuilt support for expressing complex inductive heap 
invariants using the flow framework [12,13,20] and the ability to (partially) automate 
complex linearizability arguments that require hindsight reasoning [4,5, 15,18, 19,24]. 
Together, these features enable nekton to verify challenging concurrent data structures 
such as the FEMRS tree [4] with little user guidance. 

nekton [17] is derived from the tool plankton [18,19], which shares the same 
overall goals and features as nekton but strives for full proof automation at the expense 
of generality. In terms of the trade-off between automation and expressivity, nekton 
aims to occupy a sweet spot between plankton and general purpose program verifiers. 
In the following, we discuss nekton’s unique features in more detail and explain how 
it deviates from plankton’s design. 

The flow framework can be used to express global properties of graph structures 
in a node-local manner, aiding compositional verification of recursive data structures. 
The framework is parametric in a flow domain which determines what global infor- 
mation about the graph is provided at each node. Various flow domains have been 
proposed that have shown to be useful in concurrency proofs [11,26]. To simplify 
proof automation, plankton uses a fixed flow domain that is geared towards verify- 
ing functional correctness of search structures. In contrast, nekton is parametric in the 
flow domain. For instance, it supports custom domains for reasoning about overlayed 
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structures and other data-structure-specific invariants. This design choice significantly 
increases the expressivity of the tool at the cost of a mild increase in the annotation bur- 
den for the user. For instance, the FEMRS tree case study that we present in this paper 
relies on a flow domain that is beyond the scope of plankton. In fact, the flow domain 
is also beyond state-of-the-art abstract interpretation-based verification tools checking 
linearizability [1]. However, computing relative to a given flow domain is considerably 
more difficult than computing with a hard-coded one: it requires parametric versions for 
(1) computing post images, (2) checking entailment, and (3) checking non-interference. 
Yet, it allows for sufficient automation compared to general user-defined (recursive) 
predicates as accepted by, e.g., Viper [22] and VeriFast [9]. 

The second key feature of nekton is its support for hindsight reasoning. Intuitively, 
hindsight arguments rely on statements of the form “if q holds in the current state and 
p held in some past state, then r must have held in some intermediate state”. Such 
arguments can greatly simplify the reasoning about complex concurrent algorithms that 
involve future-dependent linearization points. At a technical level, hindsight reasoning 
is realized by lifting a state-based separation logic to one defined over computation 
histories [18,19]. nekton’s support for this style of reasoning goes beyond the simple 
hindsight rule in [18] but does not yet implement the general temporal interpolation 
rule introduced more recently in [19], which is already supported by plankton. 

These features set nekton apart from its competitors. First, it offers more expres- 
sivity compared to tools with a higher degree of automation like plankton [18,19], 
Cave [29-31], and Poling [34]. Second, it’s proofs require less annotation effort than 
more flexible refinement-proofs for fine-grained concurrency, like those of CIVL [7, 10] 
and Armada [16]. Last, it integrates techniques for proving linearizability, which are 
missing in industrial grade tools like Anchor [6]. 

In the remainder of this paper, we provide a high-level overview of the tool (Sect. 2), 
present a case study (Sect. 3), and discuss implementation details some of which also 
concern plankton and have not yet been reported on before (Sect. 4). 


2 Input 


nekton checks the correctness of proof outlines for the linearizability of concurrent 
data structures. Its distinguishing feature compared to its ancestor plankton is that 
the heap abstraction is not hard-coded inside the tool, but taken as an input parameter. 
That is, nekton’s input is a heap abstraction and a set of proof outlines, one for each 
function manipulating the data structure state. The heap abstraction defines how the 
data structure’s heap representation is mapped onto a labeled graph that captures the 
properties of interest and that can then be reasoned about in separation logic. It also 
embeds the mechanism for checking linearizability. 

nekton works with the recent flow graphs proposed by Krishna et al. [12,13], in 
their latest formulation due to [18]. Flow graphs augment heap graphs with ghost state. 
The ghost state can be understood as a certificate formulating global properties of heap 
graphs in a node-local manner. It takes the form of a so-called flow value that has been 
propagated through the heap graph and, therefore, brings global information with it. The 
propagation is like in static analysis, except that we work over heap graphs rather than 
control-flow graphs. To give an example, assume we want to express the global property 
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that the heap graph is a tree. A helpful certificate would be the path count, the number 
of paths from a distinguished root node to the node of interest. It allows us to formulate 
the tree property node-locally, by saying that the path count is always at most one. 

Our first input is a flow domain (M, gen). The parameter (M, +, 0) is a commuta- 
tive monoid from which we draw the flow values. The propagation needs standard fixed 
point theory: the natural ordering a < a+ b for a,b € M on the monoid should form 
an w-complete partial order. We expect the user to specify both + and < to avoid the 
quantifier over the offset in the definition of <. The parameter gen generates the trans- 
fer functions labeling the edges in the heap graph. Transfer functions transform flow 
values to record information about the global shape. The generator has the type 


gen : PointerFld — (DataFld > Data) —> Mon(M —> M). 


We assume flow graphs distinguish between pointer fields (PointerFld) and fields that 
hold data values (DataFld). Flow values are propagated along every pointer field, in a 
way that depends on the current data values but that does not depend on the target of 
the field. To see that the data values are important, imagine a node has already been 
deleted logically but not yet physically from a data structure, as is often the case in 
lock-free processing. Then the logical deletion would be indicated by a raised flag (a 
distinguished data field), and we would not forward the current path count. To reason 
about flow values with SMT solvers, we restrict the allowed types of flow values to 


M := B|N{|PB)| PN) | MxM. 


Flow values are (sets of) Booleans or integers, or products over these base types. When 
defining a product type, the user has to label each component with a selector allowing 
to project a tuple onto this component. Importantly, the user can define the addition 
operation + for the flow monoid freely over the chosen type as long as the definition 
is expressible within the underlying SMT theory (e.g., for N one may choose as + the 
usual addition or the maximum). The tool likewise inherits the assertion language for 
integers and Booleans that is supported by the SMT solver. There are two more user- 
defined inputs that are tightly linked to the heap representation. 


Linearizability. We establish the linearizability of functions manipulting a data struc- 
ture with the help of the keyset framework [11,28], which we encode using flows. A 
crucial problem when proving linearizability are membership queries: we have to deter- 
mine whether a given key has been in the data structure at some point in time while the 
function was running. The keyset framework localizes these membership queries from 
the overall data structure to single nodes. It assigns to each node n a set of keys for 
which n is responsible, in the sense that n has to answer the membership queries for 
these keys. This set of keys is n’s keyset. Imagine we have a singly linked list 


Head Ss 9, 0) E9, (ng, 7) E9, (ag 0) E9, 1. 

The shared pointer Head propagates the keys in the interval (—o0, 00) as a flow value 
to node nı holding key 5. This set is called n1’s inset. The inset of a node n contains all 
keys k for which a search will reach n. If k > 5, the search will proceed to n2, otherwise 
it will stay at nı. Thus, the keyset of nı is (—oo, 5]. That is, if k € (—oo, 5], the answer 
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to the membership query is determined by the test k = 5. Node nı forwards [6, 00) 
to the successor node nz with key 7. Since ng has been logically deleted, indicated by 
the tombstone f, it cannot answer membership queries: the keyset is empty. Instead, 
the node forwards its entire inset [6,00) to node ng, which is now responsible for the 
keyset [6, 10]. We speak of a framework because whether a given key k belongs to a 
node’s keyset or whether it is propagated to one of the node’s successors is specific to 
each data structure, but the way in which the linearizability argument for membership 
queries is localized to individual flow graph nodes is always the same. 

In nekton, the user can define P(N) for sets of keys as (a component in) the flow 
domain of interest. With parameter gen, they can implement the propagation. We also 
provide flexibility in the definition of the keyset and membership queries in the form of 
two predicates rsp (responsible) resp. cnts (contains). To give an example, we would 
define 

rsp(a,k) & k € a>flow.is * k < a>key * ~z>marked . 


With x>flow, we denote x’s flow value. The flow domain is a product, and we refer to 
the component called is. With +key and x>marked we denote the x’s key and marked 
fields. Formally, the dereference notation is a naming convention for logical variables 
that refer to values of resources defined in the node-local invariant explained below. 
Reconsider the example and let k = 6. The key belongs to the inset [6, 00) that no 
receives from nı. We discussed that the node’s keyset is empty, and indeed rsp(no, 6) 
is false. For ng, we have rsp(n3,6) true. With the predicate rsp in place, we can also 
refer to n.keyset in assertions. 

For verifying functions with non-fixed linearization points, nekton implements the 
hindsight principle [24]. Reasoning with that principle goes as follows. We record infor- 
mation about bygone states of the data structure in past predicates © a. For example, 
©(k E€ x>flow.is) says that the key of interest was in the node’s inset at some point 
while the function was running. Moreover, the assertion about the current state may tell 
us that the key is smaller than the key held by the node and that the node is not marked 
now, k < x>key x >n>marked. Then the hindsight principle will guarantee that there 
has been a state in between the two moments where the node still had the key in its inset, 
the inequality held true, and the node was unmarked. This is © rsp(n, k) as defined 
above. To draw this conclusion, the hindsight principle inspects the interferences the 
data structure state may experience from concurrently executed functions. In the exam- 
ple, no interferene can unmark a node or change a key. So the predicates encountered in 
the current state must have held already in the past state when k € x>flow.is was true. 
This form of hindsight reasoning is stronger than the one in [18] but not yet as elaborate 
as the one in [19]. From a program logic point of view, hindsight reasoning relies on a 
lifting of state-based to computation-based separation algebras [18]. 


Implications. Reasoning about automatically generated transfer functions is difficult, 
in particular when they relate different components in a product flow domain. Consider 
N x P(N) with the first component the path count at a node and the second component 
the keyset. The transfer functions will never forget to count a path, and so the following 
implication will be valid over all heap graphs: 


(x>flow.pcount)=0 = > (ax>flow.keyset) = Ø . (1) 
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Despite the help of an SMT solver, nekton will fail to establish the validity of such an 
implication. Therefore, the user may input a set of such formulas that the tool will then 
take for being valid without further checks. Correctness of a proof is always relative to 
this set of implications. 


2.1 Proof Outlines 


A concurrent data structure consists of a set of structs defining the heap elements and 
a set of functions for manipulating the data structure state. nekton expects as input a 
proof outline for each such function. The program logic implemented by nekton is an 
Owicki-Gries system that, besides partial correctness, requires interference freedom of 
the given proof outlines. The user is expected to give the interferences as input. 

The proof outlines accepted by nekton take the form { pre } po { post } with 


po = com | {a} | po;po | (po+po){a} | {a} po* {a} | atomic po. 


The proof outlines are partial in that intermediary assertions, say in com; ; comg, may 
be omitted. nekton will automatically generate the missing information using strongest 
postconditions. What has to be given are loop invariants and unifying assertions for 
the different branches of if-then-else statements. Consecutive assertions { a}; { b } are 
interpreted as a weaking of a to b. 

Programs are given in a dialect of C. Commands are assignments to/from variables 
and memory locations, allocations, assumptions, and acquires/releases of locks 


com == p:=q | pofld:=q | p:=q>fld | p:=malloc 


| assume (cond) | acquire(p>fld) | release(p>fld) . 


Here, p,q are program variables, fld is a field name, and dereferences are denoted by 
an arrow. The language is strictly typed with base types void, bool, and int. The latter 
represents the mathematical integers, i.e., has an infinite domain. We admit the usual 
conditions over the base types. Using the struct keyword users can specify their own 
types. In addition, nekton supports syntactic sugar like if-then-else, (do-)while loops, 
non-recursive macros, break and return statements, assertions, simultaneous assign- 
ments, and compare-and-swaps. These can be expressed in terms of the core language 
in the expected way. 

The assertion language is a standard separation logic defined over the base types, 
heap graphs, and the given flow domain. It has the separating conjunction and classical 
implication (no magic wand). Our heap model is divided into a local and a shared heap, 
and we use the box operator | a | to indicate assertions over the shared state. The shared 
state is represented by an iterated separating conjunction. Since this conjunction refers 
to a set of nodes and we want to reason first-order, we handle it implicitly. We let each 
assertion a in a proof outline stand for da. a x X, EN\ Nodes(a) NInv(n). The iterated 
separating conjunction is over all nodes that do not occur in a, and asserts a node-local 
invariant for each of them. The existential quantifier is over all logical variables in the 
assertion. Keeping it implicit makes the assertions more concise and aids automation. 


Node Invariants. nekton expects the node-local invariant NJnv(n) as another input. 
The role of this invariant is to make use of the flow framework and state global prop- 
erties of the data structure in a local way. The invariant would say, for instance, that 
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sentinel nodes are never marked. Compared to the implication list, the node-local invari- 
ant has the advantage that its claims are actually checked. Technically, the node-local 
invariant is a separation logic formula that is only allowed to refer to the given node n 
and its fields. It will often define logical variables like n+flow that refer to the entry 
of the flow field and can be used outside the node-local invariant. These variables are 
quantified away by Jæ above. 


Interferences. Interferences are RGSep actions [32] restricted to the format 
NInv(x). {a} ~ [fldy,...,fld,]{ 6}. (2) 


To give an example, we formulate that a concurrently executed function may mark a 
node using the action NInv(a). { =(a>marked) } ~> [marked]{ «>marked }. An action 
refers to a single node in the heap graph as described by the above node-local invariant. 
The action applies if the assertion a evaluates to true, and modifies the node in a way 
that satisfies b. Like the invariant, the assertions a and b have to be node-local and only 
refer to the values of x’s fields. The assertions may introduce logical variables that are 
implicitly existentially quantified and whose scope extends over a and b. Such variables 
allow us to relate the pre- and post-state of the interference. The fields given in the 
brackets are the ones that may change under the action. If assertion b does not refer to 
the value of a field that is given in the list, the field may receive arbitrary values. If a 
field is not named, it is guaranteed to stay unchanged. 


3 Case Study 


We present a linearizability proof of the FEMRS tree [4] conducted with nekton. We 
omit the data structure’s maintenance operation because it leads to flow updates that 
neither nekton nor another state-of-the-art technique aimed at automation can handle. 
Each node in the tree stores one key and points to up to two child nodes left and 
right, storing keys with lower and higher values, respectively. In addition, each node 
contains two Boolean fields del and rem for the removal of nodes. This is because 
the tree distinguishes the logical removal, indicated by the del flag, from the physical 
unlinking of a node, indicated by the rem flag. As long as a logically removed node has 
not been unlinked, it can become part of the tree again. The idea is to save the creation 
of new nodes for keys that are physically but no longer logically part of the tree. Lastly, 
every node can be locked. 

Figure 1 depicts a possible state of the 
FEMRS tree. Each node is labeled with its key. 
Dashed nodes have been logically removed. To 
prove linearizability, we rely on the keyset frame- 
work. The inset flow is used to define the keysets, 
as explained earlier. The edges in the figure are (— 
labeled with the flow they propagate. The transfer 


15} 
A b. 
functions leading to this propagation stem from (20) 


the following generator gen: Fig. 1. A state of the FEMRS tree. 


gen(fld) = \f.a>del ? f : f \(fld= left ? [x>key, 00) : (—co,x>key] ) . 
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The predicates defining the keyset and membership are 


rsp(z,k) = k€ x>flow.is * k = x>key 
Vk E€ x>flow.is * k < rokey x roleft = nil 
Vk E€ x>flow.is x k > x>key x x>right = nil 


ents(x,k) 2 k € a>flow.is x k = x>key x =z>del . 


In the example, rsp(5, 7), rsp(15, 15), rsp(20, 17), ents(12, 12) and more hold. 

The set of interferences expresses this: (11) As long as the lock of the node is not 
held by the thread under consideration and as long as the node has not been marked 
unlinked, the child pointers and the (logical and physical) removal flags may change 
arbitrarily. The proof does not rely, e.g., on the fact that the rem flag is raised only once 
and only when the del flag is true. (12) A lock that is not held by the thread may change 
arbitrarily. (I3) A node that is being physically unlinked ceases to receive flow. The 
following nekton actions formalize this: 


NInv(x).{a>lock# owned x ax>rem} ~ [left, right, del, rem] { true } (1) 


NInv(x).{a>lockF owned } ~ [lock]{ true } (12) 
NInv(a).{a2>lock# owned x x>flow.is#O x r>rem}~» [is]{a>flow.is=@}. (13) 


We prove the linearizability of the functions contains(k), insert(k), and 
remove(k). All of them call the auxiliary function locate(k), which returns the last 
edge it traversed during a search for key k. Figure 2 gives the proof outline of locate. 
The proof for the full implementation can be found in [17]. 

We use a product flow domain P(N) x N. The first component is the inset flow with 
the generator function discussed above. The second component is the pathcount, whose 
gen() simply yields the identity for all edges. The benefit of the product flow is that we 
can prove memory safety on the side, while conducting the linearizability proof. 

In the node-local invariant, we introduce logical variables like x>1left to make the 
proof more readable. We refer to these variables in the generator function. The invariant 
for the node pointed to by the shared Root differs from that of the remaining nodes: 


NInv(a) & xm ( flow= (x>flow.is, 2>flow.pcount), 
left = x>left, right = v>right, key = r>key, 
lock = a>lock, del = w>del, rem = x>rem) 
x NInvai(x) * (a = Root => NInvpoor(x)) 
NInvpoot(z) = a«>key = —00 * 2 >del * =x>rem 
* w>flow.is = (—00,00) * x>flow.pcount = 1 
Ninvai(z) & (na rem > z>key E€ r>flow.is) x x>flow.pcount < 3 
* (c>rem => x>del) x (x>left = r>right > v>left = nil). 


The node-local invariant makes the expected claims. The root has key —oo, is neither 
logically deleted nor unlinked, has as incoming keys (—oo, co) and the pathcount is 1. 
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1 { -œ <k<co* NInv (Root) } 

2 inline <Nodex, Nodex> locate(data_t k) { 

3 Nodex p, c; p = Root; c = Root; 

4 fp = c = Root * NInv(Root) * ©[NInv(c) * k € c>flow.is] x —o0 < k < co } 

5 do { Ninv(p) x (NInv(c) x ©[NInv(c) * k E€ c>flow.is] V ©[NInv(p) * rsp(p, k)] * c€ = ni1) } 
6 { { NInv(p) * NInv(c) * ©[NInv(c) * k E€ c>flow.is] x cokey # k} 


7 p =c; 
8 if (p>key < k) { 

9 assert(p>right = nil || p>right Æ nil); 
10 c = p>right; 


NInv(p) * (NInv(c) V O[NInv(p) * p>right = nil] * c = nil) 
x ©[NInv(p) * k € p>flow.is] * p>right = c * p>key < k 


12 { NInv(p) * (NInv(c) * @[NInv(c) * k E€ c>flow.is] V @[NInv(p) * rsp(p, k)] * c = nil) } 
13 } else { /* symmetric to 'then' branch */ } 
14 { NInv(p) x (NInv(c) * ©[NInv(c) * k € c>flow.is] V ©[NInv(p) * rsp(p, k)] * ¢ = nil) } 


15 } while (c Æ nil && c>key Æ k); 

NInv(p) * (NInv(c) * ©[NInv(c) * k € c>flow.is] x c>key = k 
“ { V [NInv(p) * rsp(p, k)] * c = nil) \ 
17 return <p, c>; 


wo 3} 


Fig. 2. Proof outline for locate as verified by nekton. 


These flow values are established by the data structure’s initialization function using an 
auxiliary edge with an appropriate generator. For all nodes, we have that their key is 
in the inflow, provided the node has not yet been unlinked, the path count is at most 3, 
a node has to be first logically deleted before it can be unlinked, and the only case in 
which the left and the right child can coincide is when they are both the null pointer. We 
treat nil as a node outside the set of nodes N. This in particular means the node-local 
invariant does not apply to it. It will follow from the definition of the generator function 
that the keysets are disjoint. We do not need to state this in the invariant as it is only 
important when interpreting the verification results. 

The assertion on line 9 helps our implication engine, which is designed for conjunc- 
tive assertions, deal with the disjunctions. 

We explain the implication between Lines 11 and 12. It starts with the assertion 
{ NInv(p) * NInv(c) * ©[NInv(p) x k € p>flow.is] * p>right = c * pkey < k }. 
To apply the hindsight principle, we derive the following guarantees from the set of inter- 
ferences. A node’s key is never changed. The only way a node’s inset can shrink is by 
unlinking, after which its left and right pointers are no longer changed. The right 
child of p is not nil in the current state. From this information, the hindsight principle 
concludes { ©[NJnv(p) * NInv(c) * k € p>flow.is * p>key < k * p>right = c] }. 
Together with the definition of the transfer functions labeling the edges, this asser- 
tion yields { ©[NJnv(c) * k €c>flow.is] }. Another hindsight application starts with 
{ NInv(p) * c = nil * O| NInv(p) * k E€ p>flow.is] x p>right = c * pokey < k } 
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and moves the facts known in the current state into the past predicate. The definition of 
rsp(x, k) then yields { ©[NInv(p) * rsp(p, k)] }. 

The full proof consists of 99 lines of code, 48 lines of assertions to prove them lin- 
earizable, and 56 lines of definitions for the flow domain, interferences, and invariants. 
nekton takes 45s to verify the proof’s correctness on an Apple M1 Pro. 


4 Correctness and Implementation 


nekton checks that the verification conditions generated from the given proof outlines 
hold and that the assertions are interference-free. The program logic from [18, 19] then 
gives the following semantic guarantee: no matter how many client threads execute the 
data structure functions, partial correctness holds. That is, if a function is executed from 
a state satisfying the precondition and terminates, it must have reached a state in which 
the postcondition held true. Termination itself is not guaranteed. The postcondition will 
relate the function’s return value to a statement about membership of the given key in 
the data structure, and the keyset framework will allow us to conclude linearizability 
from this relation. The verification conditions will in particular make sure the node 
invariant is maintained. We discuss the actual checks. 

The first step is to derive and check verification conditions for all commands com. If 
the command is surrounded by assertions, { p }; com; { q }, the verification condition is 
sp(p, com) = q, the strongest postcondition sp of p under com entails q. If the assertion 
{q } is not given, nekton completes the given proof by using q = sp(p, com). The 
verification conditions for loops are similar. For two consecutive assertions { p } ; {q }, 
as they occur for example at the end of a branch, the verification condition is p = q. 

The second step is to check that the assertions { p} and { q } in the proof are 
interference-free, i.e., cannot be invalidated by the actions of other threads. 

Finally, nekton checks that the interferences given by the user cover the actual 
interferences of the program. We review the above steps in more detail. 


Strongest Postconditions. The computation of the strongest postcondition follows the 
standard axioms for separation logic [23]. However, they do not deal with the flow 
which may not only be directly modified by com but also indirectly by an update else- 
where. To deal with such indirect updates, nekton computes a footprint fp: a subset 
of the heap locations that the standard axioms require plus those locations whose flow 
changes due to com. The footprint yields a decomposition p = fp * f of predicate p, 
where f is a frame that is not affected by the update. From this decomposition, we com- 
pute the strongest postcondition as sp(p, com) = sp (fp, com) * f, using the frame rule. 
Actually, nekton also shows that the update maintains the node invariant, which only 
requires a check for sp(fp, com). 

For fp to be a footprint wrt. com, all nodes outside fp should receive the same flow 
from sp(fp, com) as from fp. This holds if fp and sp(fp, com) induce the same flow 
transformer function [20]. To determine a footprint, nekton takes a strategy that is 
justified by lock-free programming [18]. Starting from the updated nodes, it gathers a 
small (fixed) set of locations that forms an acyclic subgraph. Acyclicity guarantees that 
fp and sp(fp, com) have the same transformer iff they agree on the transformation along 
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all paths: if n belongs to fp and n>fld does not, then n>fld must point to the same 
location and transform inflows to outflows in the same way in fp and in sp (fp, com). 
The strongest postcondition above is for state-based reasoning. For predicates over 
computations, which have state and past predicates, we use the following observation: 
past predicates are never invalidated by commands. This allows us to just copy them 
to the postcondition: sp(p x ©q, com) = sp(p, com) x © p x © q. Note that we add 
the precondition as a new past predicate. Moreover, we may add new past predicates 
derived by hindsight arguments. As these derived past predicates are implied by the 
postcondition, they formally do not strengthen the assertion, but of course help the tool. 


Hindsight Reasoning. Recall from Sect. 2 that hindsight reasoning draws conclusions 
of the form © p * q > © r: every computation from a p-state must inevitably transition 
through r in order to reach q. In nekton, p and q are restricted to node-local predicates 
in the sense defined above, and r is fixed to p A q. 

To prove the implication, assume it did not hold. Then there is a computation where 
p is invalidated before q is established. This is covered by the interference: there is 
an action act, invalidating p and an action act, establishing q. Let act, and act, be 
NInv(n). { Op } ~ [...]{-.. } resp. NInv(n). { og } ~ [...]{... }. There is (always) 


a decomposition op = o$ * 0," such that o, is immutable. Immutability holds if op 
is shared and interference-free. Consequently, o, must still hold when q is established. 
Now, we check if On and og are contradictory, On A^ oq = false. If so, act, is not 
enabled after act,,. This, in turn, means q cannot be established after p is invalidated— 
the computation cannot exist. nekton draws the hindsight conclusion if it can prove the 


contradiction for all pairs actp, act, of interferences that invalidate p and establish q. 


Entailment. Our assertions p * >K,.; © pi consist of a predicate p for the current 
state and a set of past predicates © p; tracking information about the computation. We 
have p * K,-, Opi F q * K jer ti if p =} q and Yj Ji. Op; = © qj. To show 
© pi |= © qj, we rely on the algorithm for state predicates and prove p; = qj. 

Entailment checks p = q between state predicates decompose into reasoning about 
resources and reasoning about logically pure facts. The latter degenerates to an implica- 
tion in classical logic: nekton uses a straightforward encoding into SMT and discharges 
it with Z3 [21]. For reasoning about resources, nekton implements a custom matching 
procedure to correlate the resources in p and q. The procedure is guided by the program 
variables x: if the value of x is a in p and b in q, then a and b are matched, meaning b 
is renamed to a. The procedure then continues to match the fields of already matched 
addresses. Finally, nekton checks syntactically if all the resources in q occur in p. 

If nekton fails to prove an implication, it consults the implication list. It takes the 
implications as they are, and does not try to embed them into a context as would be 
justified by congruence. nekton does not track the precise implications it has used. 


Interference Freedom. A state predicate p is interference-free wrt. act of the form 
NInv(n).{r } ~ [fld1,...,fldn]{ o0 }, if the strongest postcondition of p under act 
entails p itself, sp(p, act) | p. Towards sp(p, act), let p = NInv(x) * q, meaning 
x is an accessible location. Applying act to x in p acts like an assignment to the fields 
such that their new values satisfy o. The strongest postcondition for this is standard [3]: 


sp,(p, act) = o[n\z] * Iyı- yn. (p * r[n\z])[z>fldi\y1,...,2>fldn\yn] - 
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We strengthen p with the precondition r of act to make sure the action is enabled. 
We use r[n\z] for r with n replaced by x, meaning we instantiate r to location z. 
We replace the old values of the updated fields with fresh quantified variables and add 
the fields’ new valuation o[n\x]. Then, the strongest postcondition sp(p, act) applies 
sp.,(p, act) to all locations x in p. 


Interference Coverage. Consider act; = NiInvu(x).{p} ~ [fldi,...,fld,]{ q } 
and acta = NInv(x).{r } ~ [fld], ..., fld, ]{ o }. We say that act, covers acta if 
actı can produce all updates induced by actz. This is the case if r = p, o = q, and 
{fld},...,fldi, } C {fld1,...,fldn }. It remains to extract the actual interferences 
of the program and check if they are covered by the user-specified ones. The extraction 
is done while computing the strongest postcondition sp: the computed footprints fp and 


sp (fp, com) from above reveal the updated fields as well as the pre- and post-states. 


Flow Encoding. The flow monoid is not yet parsed from the user input but defined 
programmatically in nekton. The transfer function generator is parsed. nekton has five 
flow domains predefined, including path counting and keysets, which are easy to extend. 
nekton does not check whether the flow monoid is indeed a monoid and satisfies the 
requirements of an w-cpo, nor whether < coincides with the natural partial order. 

The main task in dealing with a parametric rather than fixed flow domain is 
to encode predicates involving the flow into SMT formulas. This encoding is then 
used to implement the aforementioned components for strongest postconditions, hind- 
sight, entailment, and interferences. Devising the encoding is challenging because it 
requires a representation of flow values that is sufficiently expressive to define relevant 
flow domains, yet sufficiently restricted to have efficient SMT solver support (we use 
Z3 [21]). With the input format described in Sect. 2, we encode flows using the theory 
of integers and uninterpreted functions. 


Limitations. For the future, we see several directions for extensions of our current 
implementation: (i) a parser for flow monoids rather than a programmatic interface, 
(ii) support for partial annotations that are automatically completed by nekton, (iii) 
the ability to prove atomic triples instead of just linearizability for sets, and (iv) more 
helpful error messages or counterexamples to guide the proof-writing user. 
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Abstract. We consider the verification of liveness properties for con- 
current programs running on weak memory models. To that end, we 
identify notions of fairness that preclude demonic non-determinism, are 
motivated by practical observations, and are amenable to algorithmic 
techniques. We provide both logical and stochastic definitions of our 
fairness notions, and prove that they are equivalent in the context of 
liveness verification. In particular, we show that our fairness allows us 
to reduce the liveness problem (repeated control state reachability) to 
the problem of simple control state reachability. We show that this is 
a general phenomenon by developing a uniform framework which serves 
as the formal foundation of our fairness definition, and can be instanti- 
ated to a wide landscape of memory models. These models include SC, 
TSO, PSO, (Strong/Weak) Release-Acquire, Strong Coherence, FIFO- 
consistency, and RMO. 


1 Introduction 


Safety and liveness properties are the cornerstones of concurrent program verifi- 
cation. While safety and liveness are complementary, verification methodologies 
for the latter tend to be more complicated for two reasons. First, checking safety 
properties, in many cases, can be reduced to the (simple) reachability prob- 
lem, while checking liveness properties usually amounts to checking repeated 
reachability of states [47]. Second, concurrency comes with an inherent schedul- 
ing non-determinism, i.e., at each step, the scheduler may non-deterministically 
select the next process to run. Therefore, liveness properties need to be accom- 
panied by appropriate fairness conditions on the scheduling policies to prohibit 
trivial blocking behaviors [42]. In the example of two processes trying to acquire 
a lock, demonic non-determinism [20] may always favour one process over the 
other, leading to starvation. 
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Despite the gap in complexity, the verification of liveness properties has 
attracted much research in the context of programs running under the classi- 
cal Sequential Consistency (SC) [40]. An execution of a program under SC is a 
non-deterministically chosen interleaving of its processes’ atomic operations. A 
write by any given process is immediately visible to all other processes, and reads 
are made from the most recent write to the memory location in question. SC is 
(relatively) simple since the only non-determinism comes from interleaving. 

Weak memory models forego the fundamental SC guarantee of immediate 
visibility of writes to optimize for performance. More precisely, a write oper- 
ation by a process may asynchronously be propagated to the other processes. 
The delay could be owed to physical buffers or caches, or could simply be a vir- 
tual one thanks to instruction reorderings allowed by the semantics of the pro- 
gramming language. Hence we have to contend with a (potentially unbounded) 
number of write operations that are “in transit”, i.e., they have been issued by 
a process but they have yet to reach the other processes. In this manner, weak 
memory introduces a second source of non-determinism, namely memory non- 
determinism, reflecting the fact that write operations are non-deterministically 
(asynchronously) propagated to the different processes. Formal models for weak 
memory, ranging from declarative models [8,21,35,39,41] to operational ones 
[15,30, 43, 46] make copious use of non-determinism (non-determinism over entire 
executions in the case of declarative models and non-deterministic transitions in 
the case of operational models). While we have seen extensive work on verifying 
safety properties for program running under weak memory models, the litera- 
ture on liveness for programs running under weak memory models is relatively 
sparse, and it is only recently we have seen efforts in that direction [5,36]. 

As mentioned earlier, we need fairness conditions to exclude demonic behav- 
iors when verifying liveness properties. A critical issue here is to come up with 
an appropriate fairness condition, i.e., a condition that (i) is sufficiently strong 
to eliminate demonic non-determinism and (ii) is sufficiently weak to allow all 
“good” program behaviors. To illustrate the idea, let us go back to the case of 
SC. Here, traditional fairness conditions on processes, such as strong fairness 
[31], are too weak if interpreted naively, e.g. “along any program run, each pro- 
cess is scheduled infinitely often”. The problem is that even though a strongly 
fair scheduler may pick a process infinitely often, it may choose to do so only 
in configurations where the process cannot progress since its guards are not sat- 
isfied. Such guards may, for instance, be conditions on the values of the shared 
variables. For example, executions of the program in Fig. 1 may not terminate 
under SC, since the second process may only get scheduled when the value of x 
is 2, thereby looping infinitely around the do-while loop. 

Stronger fairness conditions such as transition fairness, and probabilistic fair- 
ness [11,27] can help avoid this problem. They imply that any transition enabled 
infinitely often is also taken infinitely often (with probability one in the case 
of probabilistic fairness). Transition fairness eliminates demonic scheduler non- 
determinism, and hence it is an appropriate notion of fairness in the case o SC. 
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r = 0; 

while (r != 1) f{ do { s = x; } until (s == 1) 
x= 1; x= 2;r=y; || yst; 

} 


Fig. 1. Does this program always terminate? Only if we can guarantee that the process 
to the right will eventually be scheduled to read when x = 1. 


However, it is unable to eliminate demonic memory non-determinism. The rea- 
son is that transition fairness allows runs of the programs where write events 
occur at a higher frequency than the frequency in which they are propagated to 
the processes. This means that, in the long run, a process may only see its own 
writes, potentially preventing its progress and, therefore, the system’s progress 
as a whole. This scenario is illustrated in Fig. 2. 


do { x = 1; } do { x = 2; } 
until (x = 2 or y = 1); until (x = 1 or y = 1); 
y= 1; y= 1; 


Fig. 2. This program is guaranteed to terminate under any model only if pending 
propagation is guaranteed to not accumulate unboundedly: e.g. in TSO, each process 
may never see the other’s writes due to an overflowing buffer. 


To deal with memory non-determinism, we exploit the fact that the sizes of 
physical buffers or caches are bounded, and instruction reorderings are bounded 
in scope. Therefore, in any practical setting, the number of writes in transit at 
a given moment cannot be unbounded indefinitely. This is what we seek to cap- 
ture in our formalism. Based on this observation, we propose three new notions 
of fairness that (surprisingly) all turn out to be equivalent in the context of 
liveness. First, we introduce boundedness fairness which only considers runs of 
the system for which there is a bound b on the number of events in transit, in 
each configuration of the run. Note that the value of b is arbitrary (but fixed for 
a given run). Bounded fairness is apposite: (i) it is sufficiently strong to elimi- 
nate demonic memory non-determinism, and (ii) it is sufficiently weak to allow 
all reasonable behaviors (as mentioned above, practical systems will bound the 
number of transient messages). Since we do not fix the value of the bound, this 
allows parameterized reasoning, e.g., about buffers of any size: our framework 
does not depend on the actual value of the bound, only on its mere existence. 
Furthermore, we define two additional related notions of fairness for memory 
non-determinism. The two new notions rely on plain configurations: configura- 
tions in which there are no transient operations (all the writes operations have 
reached all the processes). First, we consider plain fairness: along each infinite 
run, the set of plain configurations is visited infinitely often, and then define the 
probabilistic version: each run almost surely visits the set of plain configurations. 
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We show that the three notions of fairness are equivalent (in Sect.4, we make 
precise the notion of equivalence we use). 

After we have defined our fairness conditions, we turn our attention to the 
verification problem. We show that verifying the repeated reachability under 
the three fairness conditions, for a given memory model m, is reducible to the 
simple reachability under m. Since our framework does not perform program 
transformations we can prove liveness properties for program P through proving 
simple reachability on the same program P. As a result we obtain two important 
sets of corollaries: if the simple reachability problem is decidable for m, then 
the repeated reachability problem under the three fairness conditions are also 
decidable. This is the case when the memory model m is TSO, PSO, SRA, etc. 
Even when the simple reachability problem is not decidable for m, e.g., when 
m is RA, RMO, we have still succeeded to reduce the verification of liveness 
properties under fairness conditions to the verification of simple probability. This 
allows leveraging proof methodologies developed for the verification of safety 
properties under these weak memory models (e.g., [22,29]). 

Having identified the fairness conditions and the verification problem, there 
are two potential approaches, each with its advantages and disadvantages. We 
either instantiate a framework for individual memory models one after one or 
define a general framework in which we can specify multiple memory models 
and apply the framework “once for all”. The first approach has the benefit of 
making each instantiation more straightforward, however, we always need to 
translate our notion of fairness into the specific formulation. In the second app- 
roach, although we incur the cost of heavier machinery, we can subsequently 
take for granted the fact that the notion of fairness is uniform across all models, 
and coincides with our intuition. This allows us to be more systematic in our 
quest to verify liveness. In this paper, we have thus chosen to adopt the second 
approach. We define a general model of weak memory models in which we repre- 
sent write events as sequences of messages ordered per variable and process. We 
augment the message set with additional conditions describing which messages 
have reached which processes. We use this data structure to specify our fairness 
conditions and solve our verification problems. We instantiate our framework to 
apply our results to a wide variety of memory models, such as RMO [12], FIFO 
consistency, RA, SRA, WRA [34,35], TSO [13], PSO, StrongCOH (the relaxed 
fragment of RC11) [30], and SC [40]. 


In summary, we make the following contributions 


— Define new fairness conditions that eliminate demonic memory non- 
determinism. 

— Reduce checking the repeated reachability problem under these fairness con- 
ditions to the simple reachability problem. 

— Introduce a general formalism for weak memory models that allows applying 
our results uniformly to a broad class of memory models. 
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— Prove the decidability of liveness properties for models such as TSO, PSO, 
SRA, WRA, StrongCOH, and opening the door for leveraging existing proof 
frameworks for simple reachability for other models such as RA. 


We give an overview of a wide landscape of memory models in Sect. 3.3, and 
provide a high level explanation of the versatility of our framework. 


Structure of the Paper. We begin by casting concurrent programs as transi- 
tion systems in Sect. 2. In Sect. 3, we develop our framework for the memory such 
that the desired fairness properties can be meaningfully defined across several 
models. In Sect. 4, we define useful fairness notions and prove their equivalence. 
Finally, in Sect. 5 we show how the liveness problems of repeated control state 
reachability reduce to the safety problem of control state reachability, and obtain 
decidability results. A full version of this paper is available at [6]. 


2 Modelling Concurrent Programs 


We consider concurrent programs as systems where a set of processes run in 
parallel, computing on a set of process-local variables termed as registers and 
communicating through a set of shared variables. This inter-process communi- 
cation, which consists of reads from, writes to, and atomic compare-and-swap 
operations on shared variables, is mediated by the memory subsystem. The over- 
all system can be visualized as a composition of the process and memory sub- 
systems working in tandem. In this section we explain how concurrent programs 
naturally induce labelled transition systems. 


2.1 Labelled Transition Systems 


A labelled transition system is a tuple 7 = ([,—,A) where T is a (possibly- 
infinite) set of configurations, >C PI x A xT is a transition relation, and A is 
the set of labels that annotate transitions. We also refer to them as annota- 
tions to disambiguate from instruction labels. We write 7 4 y’ to denote that 
(y,l, y) E€—>, in words that there is a transition from y to y’ with label l. We 
denote the transitive closure of + by Ž, and the k-fold self-composition (for 


k € N) as. 


Runs and Paths. A (possibly infinite) sequence of valid transitions p = y, > 
Y2 —> %3: is called a run. We say that a run is a y-run if the initial configuration 
in the run is y, and denote the set of y-runs as Runs(7). We call a (finite) prefix 
of a run as a path. In some cases transition systems are initialized, i.e. an initial 
set Tinit C T is specified. In such cases, we call runs starting from some initial 
configuration (y1 > Y2 > Y3... with yı € Linit) as initialized runs. 
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2.2 Concurrent Programs 


The sequence of instructions executed by each process is dictated by a con- 
current program, which induces a process subsystem. We begin by formulat- 
ing the notion of a program. We assume a finite set P of processes that oper- 
ate over a (finite) set X of shared variables. Figure3 gives the grammar for 
a small but general assembly-like language that we use for defining the syn- 
tax of concurrent programs. A program instance, prog is described by a set of 
shared variables, var*, followed by the codes of the processes, (proc reg* instr 
*)*. Each process p € P has a finite set Regs(p) of (local) registers. We assume 
w.l.o.g. that the sets of registers of the different processes are disjoint, and define 
Regs(prog) := UpepRegs(p). We assume that the data domain of both the shared 
variables and registers is a finite set D, with a special element 0 € D. The code 
of a process starts by declaring its set of registers, reg*, followed by a sequence 
of instructions. 


prog ::= var* (proc reg* instr*)* 
instr ::= lbl : stmt 
stmt ::= var:=reg | reg:=var | reg:=CAS(var,reg,reg) | 


reg:=expr | if reg then 1lbl | term 


Fig. 3. A simple programming language. 


An instruction i is of the form | : stmt where | is a unique (across all processes) 
instruction label that identifies the instruction, and stmt is a statement. The 
labels comprise the set of values the program counters of the processes may take. 
The problems of (repeated) instruction label reachability, which ask whether a 
section of code is accessed (infinitely often), are of importance to us. 

Read (reg := var) and write (var : = reg) statements read the value of a 
shared variable into a register, and write the value of a register to a shared 
variable respectively. The CAS statement is the compare-and-swap operation 
which atomically executes a read followed by a write. We consider a non-blocking 
version of the CAS operation which returns a boolean indicating whether the 
operation was successful (the expected value was read and atomically updated 
to the new value). The write is performed only if the read matches the expected 
value. 

We assume a set expr of expressions containing a set of operators applied 
to constants and registers without referring to the shared variables. The reg 
:= expr statement updates the value of register reg by evaluating expression 
expr. We exact set of expressions is orthogonal to our treatment, and hence 
left uninterpreted. The if-statement has its usual interpretation, and control 
flow commands such as while, for, and goto-statements, can be encoded with 
branching and if-statements as usual. 
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2.3 Concurrent Programs as Labelled Transition Systems 


We briefly explain the abstraction of a concurrent program as a labelled transi- 
tion system. The details for the process component, i.e. evolution of the program 
counter and register contents, follow naturally. The key utility of this approach 
lies in the modelling of the memory subsystem, which we devote §3 to. 


Configurations. A configuration y is expressed as a tuple ((L,R),%m), where L 
maps processes to their current program counter values, R maps registers to 
their current values, and y, captures the current state of the (weak) memory. 


Transitions. In our model, a step in our system is either: (a) a silent memory 
update, or (b) a process executing its current instruction. In case (a), only the 
memory component 7m of y changes. The relation is governed by the definition 
of the memory subsystem. In case (b), if the instruction is the terminal one, 
or assigns an expression to a register, or a conditional, then only the process 
component (L, R) of y changes. Here, the relation is obvious. Otherwise, the two 
components interact via a read, write or CAS, and both undergo changes. Here 
again, the relation is governed by what the memory subsystem permits. 


Annotations. Silent memory update steps are annotated with m : Upd. Transi- 
tions involving process p executing an instruction that does not involve mem- 
ory are annotated with p : L. On the other hand, p : R(x,d), p : W(x,d), 
p : CAS(x,d,d’,b) represent reads, writes and CAS operations by p respectively. 
The annotations indicate the variable and the associated values. 

To study this transition system, one must understand which transitions, 
annotated thus, are enabled. For this, it is clear that we must delve into the 
details of the memory subsystem. 


3 A Unified Framework for Weak Memory Models 


In this section, we present our unified framework for representing weak memory 
models. We begin by describing the modelling aspects of our framework at a 
high level. 

We use a message-based framework, where each write event generates a mes- 
sage. A process can use a write event to justify its read only if the correspond- 
ing message has been propagated to it by the memory subsystem. The total 
chronological order in which a process p writes to variable x is given by poloc 
(per-location program order). We work with models where the order of propaga- 
tion is consistent with poloc. This holds for several models of varying strengths. 
This requirement allows us to organise messages into per-variable, per-process 
channels. We discuss these aspects of the framework in Sect.3.1. Weak mem- 
ory models define additional causal dependencies over poloc. Reading a message 
may cause other messages it is dependent on to become illegible. We discuss 
our mechanism to capture these dependencies in Sect. 3.2. The strength of the 
constraints levied by causal dependencies varies according to memory model. 
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In Sect. 3.3, we briefly explain how our framework allows us to express causality 
constraints of varying strength, by considering a wide landscape of weak memory 
models. We refer the reader to [6] for the technical details of the instantiations. 


3.1 Message Structures 


Message. A write by a process to a variable leads to the formation of a message, 
which, first and foremost records the value being written. In order to ensure 
atomicity, a message also records a boolean denoting whether the message can 
be used to justify the read of an atomic read-write operation, i.e. CAS. Finally, 
to help with the tracking of causal dependencies generated by read events, a 
message records a set of processes seen C P that have sourced a read from it. 
Thus, a message is a triple and we define the set of messages as: Msgs = Dx Bx 2". 


Channels. A channel e(x, p) is the sequence of messages corresponding to writes 
to x by process p. The total poloc order of these writes naturally induces the 
channel order. By design, we will ensure that the configuration holds finitely 
many messages in each channels. We model each channel as a word over the 
message set: e(x, p) € Msgs”. A message structure is a collection of these channels: 
e: X x P— Msgs“. 


3.2 Ensuring Consistency of Executions 


Memory models impose constraints restricting the set of message that can be 
read by a process. The framework uses state elements frontier, source, constraint 
that help enforce these constraints. These elements reference positions within 
each channel which is something that we now discuss. 


Channel Positions. The channel order provides the order of propagation of 
write messages to any process (which in turn is aligned with poloc). Thus, for 
any process p’, channel e(x, p) is partitioned into a prefix of messages that are 
outdated, a null or singleton set of messages that can be used to justify a read, 
and a suffix of messages that are yet to be propagated. In order to express these 
partitions, we need to identify not only nodal positions, but also to internodes 
(space between nodes). To this end, we index channels using the set W = NUNT. 
Positions indexed by N denote nodal positions (with a message), while positions 
indexed with N* denote internodes. For a channel of length n, the positions are 
ordered as: T = 0t <1 < 17 <2---<n<n* = Ll. A process can read from 
the message located at e(-,-)[i] for i € N. 


Frontier. With respect to a given process, a message can either have been 
propagated but not readable, propagated and readable, or none. Since the prop- 
agation order of messages follows channel order, the propagated set of messages 
forms a prefix of the channel. This prefix-partitioning is achieved by a map 
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frontier : P x X x P — W. If frontier(p,-,-) is an internode (of form i+) then 
the message v = eļi] has been propagated to p, but cannot be read because it 
is outdated. On the other hand, if frontier(p,-,-) = i € N, then the message 
eļi] can be read by the process. In Fig. 4, frontier(p1, x, p1/p2/p3) equal v} /ve/v3 
respectively (the colored nodes). Consequently, the message at index v (or the 
ones before it) are unreadable (denoted by the pattern). On the other hand the 
messages at v2,v3 are readable. 


e(x, pı) e(x, p2) e(x, p3) 
Ug @ 
e(x,P1) e(x, p2) e(x, ps) 
I 


I 


= 
= 


V3 


. . i] - 

: : : 

aL aL 1 ale Ale dL 
Fig. 4. Frontier and source. Fig. 5. Constraint. 


Source. Given process p and variable x, the process potentially can source 
the read from any of the |P| channels on x. The second state element, source : 
P x X — P performs arbitration over this choice of read sources: p can read v 
only if v = frontier(p, x, source(p,x)). In Fig. 4, while both nodes vz2,v3 are not 
outdated, source(p;,x) = p3, making v3 the (checkered) node which pı reads 
from. 


Constraint. The constraint element tracks causal dependencies between mes- 
sages. For each message m, and channel, it identifies the last message on the 
channel that is a causal predecessor of m. It is defined as a map constraint : 
N x X x P — W. Figure 5 illustrates possible constraint(v3,-,-) pointers for mes- 
sage node v3 in the context of the channel configuration in Fig. 4. 


Constraint. The frontier state marks the last messages in each channel that can 
be read by a process. Messages that are earlier than the frontier of all processes 
can be effectively eliminated from the system since they are illegible. We call 
this garbage collection (denoted as GC). 


The overall memory configuration, 


Yn = (e, (P x X x P — W), (P x X > P), (V x Xx P > W)) 
— es § Eee Mmm mma 


frontier source constraint 


consists of the message structure along with the consistency enforcing state. 


Overcoming Memory Weakness with Unified Fairness 193 


Read Transition. Our framework allows a unified read transition relation 
which is independent of the memory model that we work with. We now dis- 
cuss this transition rule which is given in Fig.6. Suppose process p is reading 
from variable x. First, we identify the arbitrated process p, which is read from 
using the source state. Then we pick the message on the (x, ps) channel which the 
frontier of p points to. Note that this must be a node of form N. The read value 
is the value in this message. Finally, we update the frontier(p,-,-) to reflect the 
fact that all messages in the causal prefix of the read message have propagated 
to p. 


Ps = Y-source(p, x) v = 7¥.frontier(p, x, Ds) v.value = d 
yı = y1|v-seen + v.seen U {p}] 
y2 = GC(qi[Ay.Ap’. frontier(p, y, p’) + max(frontier(p, y, p’), constraint(v, y, p’))]) 


p:R(x,d) 
——>n 2 


Fig. 6. The read transition, common to all models across the framework. For y € Tn, 
y.frontier, y.source, y.constraint represent the respective components of y. For a node 
v € Msgs, v.value € D represents the written value in the message node v. 


Example 1 (Store Buffer (SB)). Fig. 7 shows the Store Buffer (SB) litmus test. 
The annotated outcome of store buffering is possible in all WRA/RA/SRA/TSO 
models. Right after pı (resp. p2) has performed both its writes to x (resp. y), we 
have e(y, p2) = Tvv}.L, and e(x, p1) = Tv?vyL. 


This example illustrates how weak mem- 


ory models allow non-deterministic delays the , = Sas T a 
propagation of messages. In this example, , = 4 y=1 
frontier(p2,x, p1) = vy, and frontier(p1, y, p2) =v), r = y Bk a. ie 
both processes see non-recent messages. On the 

other hand, the annotated outcomes are observed Fig. 7. SB 


if source(p1, y) = p2 and source(p2, x) = pr. 
We now turn to a toy example (Fig. 8) to illustrate the dependency enforcing 
and book-keeping mechanisms we have introduced. 


Example 2. Consider an program with two shared variables, x,y, and two pro- 
cesses, p1, p2. We omit the channel e(po,y) for space. Process p,’s frontiers are 
shown in violet, po’s frontiers are shown in orange. We begin with the first mem- 
ory configuration. The arrow depicts constraint(vi,y,p1) = v2. This situation 
can arise in a causally consistent model where the writer of vı was aware of v2 
before writing vı. The first transition shows p updating and moving its frontier 
(to v1). This results in a redundant node (v3 in hashed texture) since the frontier 
of both pı and pə has crossed it. This is cleaned up by GC. Now, pə begins its 
read from vı. Reading vı, albeit on x, makes all writes by pı to y prior to v2 
redundant. When pə reads vı, its frontier on e(y, pı) advances as prescribed by 
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constraint(v1, y, p1), as shown in the fourth memory configuration. Note that this 
makes another message (v4) redundant: all frontiers are past it. Once again, GC 
discards the message obtaining the last configuration. 


(x, pi) (%P2) (y,P1) (x, p1) (, P2) (yi) (x, pi) (,P2) (Yp) (x,P1) (%P2) (Ypi) 
® : È V3 $ A È i = Av i : = 
: update GC read GC @ 
ji x ¥ Ji p2 RF 7 p2 j i I l 
Fig. 8. Update, constraint in action during a read, and garbage collection 


3.3 Instantiating the Framework 


Versatility of the Framework. The framework we introduce can be instanti- 
ated to RMO [12], FIFO consistency, RA, SRA, WRA [34,35], TSO [13], PSO, 
StrongCOH (the relaxed fragment of RC11) [30], and SC [40]. 

This claim is established by constructing semantics for each of these models 
using the components that we have discussed. We provide a summary of the 
insights, and defer the technical details to the full version [6]. 


X [Relaxed Memory Order (RMO) 


PSO, not WRA SRA, not PSO 

vi StrongCOH X [FIFO consistency| = = , 
x=1 a = Y /I(veads 1) wait(x = 1) |} wait = 1) 

= z x=1 || yal A y 
V0 [Weak Release Acquire (WRA)| Y= 1] b =x reads 0) wait(y = 0) || wait(x = 0) 

V^ Partial Store Order (PSO) PSO, not FIFO 


X [Release Acquire (RA) 1 CAS(x,1,2) 


ae cae is CAS(y, 1,2) x=0 
w= 1 

b = x I(veads 1) B 

M [Strong Release Acquire (SRA) y= 


yal 
FIFO, not StrongCOH} 


a=y //(reads 1) 
b=x //(reads 0) 


/ |Total Store Order (TSO) 


x=1 
Al Sequential Consistency (SC) 


Fig. 9. Memory models, arranged by their strength. An arrow from A to B denotes 
that B is strictly more restrictive than A. A green check (resp. red cross) denotes the 
control state reachability is decidable (resp. undecidable). (Color figure online) 


a = x/l(reads 1) 


b =X Meads 2) 


x=2 
a = xil(ceads 2) 


b = X I(veads 1) 


We briefly explain how our framework accounts for the increasing restrictive 
strength of these memory models. The weakest of these is RMO, which only 
enforces poloc. There are no other causal dependencies, and thus for any message 
the constraint on other channels is T. RMO can be strengthened in two ways: 
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StrongCOH does it by requiring a total order on writes to the same variable, i.e. 
mox. Here the constraint is nontrivial only on channels of the same variable. On 
the other hand, FIFO enforces consistency with respect to the program order. 
Here, the constraint is nontrivial only on channels of the same process. WRA 
strengthens FIFO by enabling reads to enforce causal dependencies between 
write messages. This is captured by the non-trivial constraint, and we note that 
seen (the set of processes to have sourced a read from a message) plays a crucial 
role here. RA enforces the mo, of StrongCOH as well as the causal dependencies 
of WRA. PSO strengthens StrongCOH by requiring a stronger precondition on 
the execution of an atomic read-write. More precisely, in any given configuration, 
for every variable, there is at most one write message that can be used to source 
a CAS operation, i.e. with the CAS flag set to true. SRA and TSO respectively 
strengthen RA and PSO by doing away with write races. Here, the Boolean CAS 
flag in messages is all-important as an enforcer of atomicity. TSO strengthens 
SRA in the same way as PSO strengthens StrongCOH. Finally, when we get to 
SC, the model is so strong that all messages are instantly propagated. Here, for 
any message, the pointer on other channels is L. 


4 Fairness Properties 


Towards the goal of verifying liveness, we use the framework we developed to 
introduce fairness properties in the the classical and probabilistic settings in 
Sect. 4.1 and Sect. 4.2 respectively. Our approach thus has the advantage of gen- 
eralising over weak memory. In Sect. 4.3 we relate these fairness properties in 
the context of repeated control state reachability: a key liveness problem. 


4.1 Transition and Memory Fairness 


In this section, we consider fairness in the classical (non-probabilistic) case. We 
begin by defining transition fairness, which [11] is a standard notion of fairness 
that disallows executions which neglect certain transitions while only taking 
others. For utility in weak memory liveness, we then augment transition fairness 
to meet practical assumptions on the memory subsystem. Transition fairness and 
probabilistic fairness are intrinsically linked [27, Section 11]. Our augmentations 
are designed to carry over to the probabilistic domain in the same spirit. 


Definition 1 (Transition fairness, [11]). We say that a program execution 
is transition fair if for every configuration that is reached infinitely often, each 
transition enabled from it is also taken infinitely often. 


We argued the necessity of transition fairness in the introduction; however, it 
is vacuously satisfied by an execution that visits any configuration only finitely 
often. This could certainly be the case in weak memory, where there are infinitely 
many configurations. To make a case for the implausibility of this scenario, we 
begin by characterising classes of weak memory configurations. 
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Definition 2 (Configuration size). Let y be a program configuration with 
memory component (e, frontier, source, constraint). We denote the configuration 
size by size(y) and it is defined as $>, >7,, len(e(x,p)), i.e. the total number of 
messages in the message structure. 


Intuitively, the size of the configuration is the number of messages “in transit”, 
and hence a measure of the weakness of the behaviour of the execution. We note 
that overly weak behaviour is rarely observed in practice [23,45]. For instance, 
instruction reorderings that could be observed as weak behaviour are limited in 
scope. Another source of weak behaviour is the actual reading of stale values 
at runtime. However, the hardware (i.e. caches, buffers, etc.) that stores these 
values is finite, and is typically flushed regularly. Informally, the finite footprint 
of the system architecture (eg. micro-architecture) implies a bound, albeit hard 
to compute, on the size of the memory subsystem. Thus, we use the notion of 
configuration size to define: 


Definition 3 (Size Bounded Executions). An execution yo, %1,... is said 
to be size bounded, if there exists an N such that for all n € N, size(yn) < N. If 
this N is specified, we refer to the execution as N-bounded. 


Already, the requirement of size-boundedness enables our system to refine our 
practical heuristics. However, if the bound N is unknown, it isn’t immediately 
clear how this translates into a technique for liveness verification. We will now 
use the same rationale to motivate and develop an alternate augmentation which 
lends itself more naturally to algorithmic techniques. Recall that we intuitively 
relate the size of the configuration to the extent of weak behaviour. Now, consider 
Sequential Consistency, the strongest of the models. All messages are propagated 
immediately, and hence, the configuration has minimal size throughout. We call 
minimally sized configurations plain, and they are of particular interest to us: 


Definition 4 (Plain message structure). A message structure 
(V,msgmap,e) is called plain, if for each variable x, 2p len(e(x, p)) = 1. 


Drawing a parallel with SC, one could reason that the recurrence of plain con- 
figurations is a hallmark of a system that doesn’t exhibit overly weak behaviour. 
This is captured with the following fairness condition. 


Definition 5 (Repeatedly Plain Executions). An execution yo,71,... is 
said to be repeatedly plain, if yi is a plain configuration for infinitely many i. 


Following the memory transition system introduced in Sect.2 and Sect. 3, 
we observe that every configuration has a (finite) path to some plain config- 
uration (by performing a sequence of update steps). Hence, if a configuration 
is visited infinitely often in a fair execution, a plain configuration will also be 
visited infinitely often. Consequently, size bounded transition fair runs are also 
repeatedly plain transition fair. 
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4.2 Probabilistic Memory Fairness 


Problems considered in a purely logical setting ask whether all executions satis- 
fying a fairness condition fulfill a liveness requirement. However, if the answer is 
negative, one might be interested in quantifying the fair executions which do not 
repeatedly reach the control state. We perform this quantification by consider- 
ing the probabilistic variant of the model proposed earlier, and defining fairness 
analogously as a property of Markov Chains. 


Markov chains A Markov chain is a pair C = (T',M) where [ is a (possibly-infinite) 
set of configurations and M is the transition matrix which assigns to each possible 
transition, a transition probability: M : T x I — [0,1]. Indeed, this matrix needs 
to be stochastic, i.e., )7,,¢p M(7, 7’) = 1 should hold for all configurations. 

We can convert our concurrent program transition (Sect. 2) into a Markov 
chain M by adding probabilities to the transitions. We assign M(y, 7’) to a nonzero 
value if and only if the transition y — 7 is allowed in the underlying transi- 
tion system. Markov Chain executions are, by construction, transition fair with 
probability 1. We now present the analog of the repeatedly plain condition.! 


Definition 6 (Probabilistic Memory Fairness. A Markov chain is consid- 
ered to satisfy probabilistic memory fairness if a plain configuration is reached 
infinitely often with probability one. 


This parallel has immense utility because verifying liveness properties for 
a class of Markov Chains called Decisive Markov Chains is well studied. [7] 
establishes that the existence of a finite attractor, i.e. a finite set of states F that 
is repeatedly reached with probability 1, is sufficient for decisiveness. The above 
definition asserts that the set of plain configurations is a finite attractor. 


4.3 Relating Fairness Notions 


Although repeatedly plain transition fairness is weaker than size bounded transi- 
tion fairness and probabilistic memory fairness, these three notions are equivalent 
with respect to canonical liveness problems, i.e. repeated control state reacha- 
bility and termination. The proof we present for repeated reachability can be 
adapted for termination. 


Theorem 1. There exists No E€ N such that for all N > No, the following are 
equivalent for any control state (program counters and register values) c: 


1. All repeatedly plain transition fair runs visit c infinitely often. 
2. All N-bounded transition fair runs visit c infinitely often. 
3. cis visited infinitely often under probabilistic memory fairness with probability 


1. 


1 A concrete Markov Chain satisfying the declarative definition may be adapted from 
the one described in [5] in a similar setting. 
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Proof. For each N € N, we construct a connectivity graph Gy. The vertices are 
the finitely many plain configurations y, along with the finitely many control 
states c. We draw a directed edge from q; to yj, if y; is reachable from q; via 
configurations of size at most N. We additionally draw an edge from a plain 
configuration y to control state c iff c is reachable from y via configurations of 
size at most N. We similarly construct a connectivity graph G without bounds 
on intermediate configuration sizes. We note: 


1. There are only finitely many possibilities for Gy 

2. As N increases, edges can only be added to Gy. This guarantees saturation. 

3. Any witness of reachability is necessarily finite, hence the saturated graph is 
the same as G, i.e. for all sufficiently large N, Gy =G 


Since plain configurations are attractors, the graph G is instrumental in deciding 
repeated control state reachability. Consider the restriction of G to plain con- 
figurations, i.e. Gp. Transition fairness (resp. Markov fairness) implies that y is 
visited infinitely often (resp. with probability 1) only if it is in a bottom strongly 
connected component (scc). In turn any control state c will be guaranteed to be 
reached infinitely often if and only if it is reachable from every bottom scc of 
Gr. The if direction follows using the transition fairness and attractor property, 
while the converse follows by simply identifying a finite path to a bottom scc 
from which c isn’t reachable. The equivalence follows because the underlying 
graph is canonical for all three notions of fairness. 


5 Applying Fairness Properties to Decision Problems 


In this section, we show how to decide liveness as a corollary of the proof of 
Theorem 1. We begin by noting that techniques for termination are subsumed 
by those for repeated control state reachability. This is because termination is 
not guaranteed iff one can reach a plain configuration from which a terminal 
control state is inaccessible. Hence, in the sequel, we focus on repeated control 
state reachability. 


5.1 Deciding Repeated Control State Reachability 


We observe that under the fairness conditions we defined, liveness, i.e. repeated 
control state reachability reduces to a safety query. 


Problem 1 (Repeated control state reachability). Given a control state (program 
counters and register values) c, do all infinite executions (in the probabilistic 
case, a set of measure 1) satisfying fairness condition A reach c infinitely often? 


Problem 2 (Control state reachability). Given a control state c and a configu- 
ration y, is (c, %m) reachable from y for some m? 
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Theorem 2. Problem 1 for repeatedly fair transition fairness and probabilistic 
memory fairness reduces to Problem 2. Moreover, the reduction can compute 
the No from Theorem 1 such that it further applies to size bounded transition 
fairness. 


Proof. This follows by using Problem 2 to compute the underlying connectivity 
graph G from the proof of Theorem 1. A small technical hurdle is that plain 
configuration reachability is not the same as control state reachability. However, 
the key to encode this problem as a control state query is to use the following 
property: for a configuration y and a message m (€ e(x, p)), if for every process 
p', m is not redundant (formally, frontier(p’,x,p) < m), then there exists a plain 
configuration yy’ containing m such that y’ is reachable from y via a sequence 
of update steps. The plan, therefore, is to read and verify whether the messages 
desired in the plain configuration are, and remain accessible to all processes. 
Finally, the computation of No follows by enumerating Gy. 


5.2 Quantitative Control State Repeated Reachability 


We set the context of a Markov chain C = (I,M) that refines the underlying 
the transition system induced by the program. We consider is the quantitative 
variant of repeated reachability, where instead of just knowing whether the prob- 
ability is one or not, we are interested in computing it. 


Problem 3 (Quantitative control state repeated reachability). Given a control 
state c and an error margin € € R, find a ô such that for Markov chain C, 
|Prob(7init = Oc) — ô| < € 


We refer the reader to [6] for details on the standard reduction, from which 
the following result follows: 


Theorem 3. If Problem 2 is decidable for a memory model, then Problem 3 is 
computable for Markov chains that satsify probabilistic memory fairness. 


5.3 Adapting Subroutines to Our Memory Framework 


We now briefly sketch how to adapt known solutions to Problem 2 for PSO, 
TSO, StrongCOH, WRA and SRA to our framework. 


PSO and TSO. Reachability between plain configurations (a special case of 
Problem 2) under these models has already been proven decidable [12]. The 
store buffer framework is similar to the one we describe, and hence the results 
go through. Moreover, [5, Lemmas 3, 4] shows the decidability of our Problem 2 
for TSO. The same construction, which uses an augmented program to reduce 
to ex-plain configuration reachability, works for PSO as well. 
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StrongCOH. Decidability of reachability under StrongCOH is shown in [1]. 
The framework used, although quite different in notation, is roughly isomorphic 
to the one we propose. The relaxed semantics of StrongCOH allow the framework 
to be set up as a WSTS [2,26], which supports backward reachability analysis, 
yielding decidability. Backward reachability gives an upward closed set of states 
that can reach a target label. Checking whether an arbitrary state is in this 
upward closed set requires a comparison with only the finitely many elements in 
the basis. This solves Problems 2. 


WRA and SRA. Decidability of reachability under WRA and SRA has 
recently been shown in [34]. The proof follows the WSTS approach, however, 
the model used in the proof has different syntax and semantics from the one we 
present here. However, a reconciliation is possible, and we briefly sketch it here. 
A state in the proof model is a map from processes to potentials. A potential 
is a finite set of finite traces that a process may execute. These proof-model 
states are well-quasi-ordered and operating on them sets up a WSTS. Backward 
reachability gives us a set of maps from processes to potentials that allow us to 
reach the target label. The key is to view a process-potential map as a require- 
ment on our message based configuration. Higher a map in the wqo, stronger the 
requirement it enforces. In this sense, the basis of states returned by backward 
reachability constitute the minimal requirements our configuration may meet in 
order for the target label to be reachable. Formally, let y be a configuration of 
our framework. The target label is reachable from y if and only if: there exists a 
process-potential map B is in the backward reachable set, such that every trace 
in every process’ potential in B is enabled in y. It suffices to check the existence 
of B over the finite basis of the backward reachable set. Note that y is completely 
arbitrary: this solves our Problem 2. 


6 Related Work 


Fairness. Only recently has fairness for weak memory started receiving increas- 
ing attention. The work closest to ours is by [4], who formulate a probabilistic 
extension for the Total Store Order (TSO) memory model and show decidability 
results for associated verification problems. Our treatment of fairness is richer, 
as we relate same probabilistic fairness with two alternate logical fairness defini- 
tions. Similar proof techniques notwithstanding, our verification results are also 
more general, thanks to the development of a uniform framework that applies to 
a landscape of models. [37] develop a novel formulation of fairness as a declar- 
ative property of event structures. This notion informally translates to “Each 
message is eventually propagated.” We forego axiomatic elegance to motivate 
and develop stronger practical notions of fairness in our quest to verify liveness. 


Probabilistic Verification. There are several works on verification of finite- 
state Markov chains (e.g. [14,33]). However, since the messages in our memory 
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systems are unbounded, these techniques do not apply. There is also substan- 
tive literature on the verification of infinite state probabilistic system, which 
have often been modelled as infinite Markov chains [17—19, 24,25]. However their 
results cannot be directly leveraged to imply ours. The machinery we use for 
showing decidability is relies on decisive Markov chains, a concept formulated in 
[7] and used in [4]. 


Framework. On the modelling front, the ability to specify memory model 
semantics as first-order constraints over the program-order (po), reads-from (rf), 
and modification-order (mo) have led to elegant declarative frameworks based on 
event structures [9, 10, 21,28]. There are also approaches that, instead of natively 
characterizing semantics, prescribe constraints on their ISA-level behaviours in 
terms of program transformations [38]. On the operational front, there have been 
works that model individual memory models [43,46] and clusters of similar model 
[30,35], however we are not aware of any operational modelling framework that 
encompasses as wide a range of models as we do. The operationalization in [16] 
uses write buffers which resemble our channels, however, their operationalization 
too focuses on a specific semantics. 


7 Conclusion, Future Work, and Perspective 


Conclusion. The ideas developed in Sect.4 lie at the heart of our contribu- 
tion: we motivate and define transition fairness augmented with memory size 
boundedness or the recurrence of plain configurations, as well as the analogous 
probabilistic memory fairness. These are equivalent for the purpose of verify- 
ing repeated control state reachability, i.e. liveness, and lie at the core of the 
techniques we discuss in Sect.5. These techniques owe their generality to the 
versatile framework we describe in Sect. 3. 


Future Work. There are several interesting directions for future work. We believe 
that our framework can be extended to handle weak memory models that allow 
speculation, such as ARM and POWER. In such a case, we would need to extend 
our fairness conditions to limit the amount of allowed speculation. It is also 
interesting to mix transition fairness with probabilistic fairness, i.e., use the for- 
mer to solve scheduler non-determinism and the latter to resolve memory non- 
determinism, leading to (infinite-state) Markov Decision Process model. Along 
these lines, we can also consider synthesis problems based on 25-games. To solve 
such game problems, we could extend the framework of Decisive Markov chains 
that have been developed for probabilistic and game theoretic problems over 
infinite-state systems [7] A natural next step is developing efficient algorithms 
for proving liveness properties for programs running on weak memory models. In 
particular, since we reduce the verification of liveness properties to simple reach- 
ability, there is high hope one can develop CEGAR, frameworks relying both on 
over-approximations, such as predicate abstraction, and under-approximations 
such as bounded context-switching [44] and stateless model checking [3,32]. 
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Perspective. Leveraging techniques developed over the years by the program 
verification community, and using them to solve research problems in program- 
ming languages, architectures, databases, etc., has substantial potential added 
value. Although it requires a deep understanding of program behaviors running 
on such platforms, we believe it is about finding the right concepts, combining 
them correctly, and then applying the existing rich set of program verification 
techniques, albeit in a non-trivial manner. The current paper is a case in point. 
Here, we have used a combination of techniques developed for reactive systems 
[31], methods for the analysis of infinite-state systems [7], and semantical models 
developed for weak memory models [12,30,34,35] to obtain, for the first time, a 
framework for the systematic analysis of liveness properties under weak memory 
models. 
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Abstract. Rely-guarantee (RG) is a highly influential compositional 
proof technique for concurrent programs, which was originally devel- 
oped assuming a sequentially consistent shared memory. In this paper, 
we first generalize RG to make it parametric with respect to the under- 
lying memory model by introducing an RG framework that is applica- 
ble to any model axiomatically characterized by Hoare triples. Second, 
we instantiate this framework for reasoning about concurrent programs 
under causally consistent memory, which is formulated using a recently 
proposed potential-based operational semantics, thereby providing the 
first reasoning technique for such semantics. The proposed program logic, 
which we call Piccolo, employs a novel assertion language allowing one 
to specify ordered sequences of states that each thread may reach. We 
employ Piccolo for multiple litmus tests, as well as for an adaptation of 
Peterson’s algorithm for mutual exclusion to causally consistent memory. 


1 Introduction 


Rely-guarantee (RG) is a fundamental compositional proof technique for con- 
current programs [21,48]. Each program component P is specified using rely and 
guarantee conditions, which means that P can tolerate any environment inter- 
ference that follows its rely condition, and generate only interference included in 
its guarantee condition. Two components can be composed in parallel provided 
that the rely of each component agrees with the guarantee of the other. 

The original RG framework and its soundness proof have assumed a sequen- 
tially consistent (SC) memory [33], which is unrealistic in modern processor 
architectures and programming languages. Nevertheless, the main principles 
behind RG are not at all specific for SC. Accordingly, our first main contribution, 
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is to formally decouple the underlying memory model from the RG proof prin- 
ciples, by proposing a generic RG framework parametric in the input memory 
model. To do so, we assume that the underlying memory model is axiomatized 
by Hoare triples specifying pre- and postconditions on memory states for each 
primitive operation (e.g., loads and stores). This enables the formal develop- 
ment of RG-based logics for different shared memory models as instances of 
one framework, where all build on a uniform soundness infrastructure of the RG 
rules (e.g., for sequential and parallel composition), but employ different special- 
ized assertions to describe the possible memory states, where specific soundness 
arguments are only needed for primitive memory operations. 

The second contribution of this paper is an instance of the general RG frame- 
work for causally consistent shared memory. The latter stands for a family of 
wide-spread and well-studied memory models weaker than SC, which are suffi- 
ciently strong for implementing a variety of synchronization idioms [6, 12, 26]. 
Intuitively, unlike SC, causal consistency allows different threads to observe 
writes to memory in different orders, as long as they agree on the order of writes 
that are causally related. This concept can be formalized in multiple ways, and 
here we target a strong form of causal consistency, called strong release-acquire 
(SRA) [28,31] (and equivalent to “causal convergence” from [12]), which is a 
slight strengthening of the well-known release-acquire (RA) model (used by 
C/C+-+11). (The variants of causal consistency only differ for programs with 
write/write races [10,28], which are rather rare in practice.) 

Our starting point for axiomatizing SRA as Hoare triples is the potential- 
based operational semantics of SRA, which was recently introduced with the 
goal of establishing the decidability of control state reachability under this 
model [27,28] (in contrast to undecidability under RA [1]). Unlike more standard 
presentations of weak memory models whose states record information about the 
past (e.g., in the form of store buffers containing executed writes before they are 
globally visible [36], partially ordered execution graphs [8, 20,31], or collections of 
timestamped messages and thread views [11,16,17,23,25,47]), the states of the 
potential-based model track possible futures ascribing what sequences of obser- 
vations each thread can perform. We find this approach to be a particularly 
appealing candidate for Hoare-style reasoning which would naturally generalize 
SC-based reasoning. Intuitively, while an assertion in SC specifies possible obser- 
vations at a given program point, an assertion in a potential-based model should 
specify possible sequences of observations. 

To pursue this direction, we introduce a novel assertion language, resembling 
temporal logics, which allows one to express properties of sequences of states. 
For instance, our assertions can express that a certain thread may currently read 
x = 0, but it will have to read x = 1 once it reads y = 1. Then, we provide Hoare 
triples for SRA in this assertion language, and incorporate them in the general 
RG framework. The resulting program logic, which we call Piccolo, provides 
a novel approach to reason on concurrent programs under causal consistency, 
which allows for simple and direct proofs, and, we believe, may constitute a 
basis for automation in the future. 
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{y #1} {To x [y # 1]} 

Thread Tı Thread T2 Thread Tı Thread T2 
{ True} ig=tS<=1} { True} {To [y 1]; [x =1)} 
1: STORE(x, 1); || 3: a := LOAD(y); 1: STORE(x, 1); || 3: a := LOAD(y); 
ig= 1} fa=1>2= 1} {TikR=1}} || fa=—1S hxk=1]} 
2: STORE(y,1) || 4: b := LOAD(x) 2: STORE(y,1) || 4: b := LOAD(x) 
{True} fa=1sbp=1} {True} sa=1=b= 1} 

aslo b=} Tasis b= 7} 

Fig. 1. Message passing in SC Fig. 2. Message passing in SRA 


2 Motivating Example 


To make our discussion concrete, consider the message passing program (MP) in 
Figs. 1 and 2, comprising shared variables x and y and local registers a and b. The 
proof outline in Fig. 1 assumes SC, whereas Fig. 2 assumes SRA. In both cases, 
at the end of the execution, we show that if a is 1, then b must also be 1. We 
use these examples to explain the two main concepts introduced in this paper: 
(i) a generic RG framework and (ii) its instantiation with a potential-focused 
assertion system that enables reasoning under SRA. 


Rely-Guarantee. The proof outline in Fig. 1 can be read as an RG derivation: 


1. Thread T, locally establishes its postcondition when starting from any state 
that satisfies its precondition. This is trivial since its postcondition is True. 

2. Thread T, relies on the fact that its used assertions are stable w.r.t. interfer- 
ence from its environment. We formally capture this condition by a rely set 
Ry = { True,x = 1}. 

3. Thread Tı guarantees to its concurrent environment that its only interfer- 
ences are STORE(x,1) and STORE(y,1), and furthermore that STORE(y,1) is 
only performed when x = 1 holds. We formally capture this condition by 
a guarantee set G, = {{ True} Tı — STORE(x, 1), {x = 1} Tı + STORE(y, 1)}, 
where each element is a command guarded by a precondition. 

4. Thread Tə locally establishes its postcondition when starting from any state 
that satisfies its precondition. This is straightforward using standard Hoare 
rules for assignment and sequential composition. 

5. Thread T’s rely set is again obtained by collecting all the assertions used 
in its proof: Ro = {y = 1 >x = 1,a = 1 > x = l,a = 1 > b = 1}. Indeed, 
the local reasoning for Tz needs all these assertions to be stable under the 
environment interference. 

6. Thread To’s guarantee set is given by: 


Go = {{y = 1 > x = 1} T2 > a := LOAD(y), {a = 1 > x = 1} T2 | b := LOAD(x)} 


7. To perform the parallel composition, (Ri,G1) and (R2, G2) should be non- 
interfering. This involves showing that each R € Ri is stable under each 
G € Gj fori # j. That is, if G = {P} T+ c, we require the Hoare triple {P/N 
R} T |> c {R} to hold. In this case, these proof obligations are straightforward 
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to discharge using Hoare’s assignment axiom (and is trivial for i = 1 and j = 2 
since load instructions leave the memory intact). 


Remark 1. Classical treatments of RG involve two related ideas [21]: (1) spec- 
ifying a component by rely and guarantee conditions (together with standard 
pre- and postconditions); and (2) taking the relies and guarantees to be binary 
relations over states. Our approach adopts (1) but not (2). Thus, it can be seen 
as an RG presentation of the Owicki-Gries method [37], as was previously done 
in [82]. We have not observed an advantage for using binary relations in our 
examples, but the framework can be straightforwardly modified to do so. 


Now, observe that substantial aspects of the above reasoning are not directly 
tied with SC. This includes the Hoare rules for compound commands (such as 
sequential composition above), the idea of specifying a thread using collections of 
stable rely assertions and guaranteed guarded primitive commands, and the non- 
interference condition for parallel composition. To carry out this generalization, 
we assume that we are provided an assertion language whose assertions are 
interpreted as sets of memory states (which can be much more involved than 
simple mappings of variables to values), and a set of valid Hoare triples for the 
primitive instructions. The latter is used for checking validity of primitive triples, 
(e.g., {P} Tı ++ STORE(x,1) {Q}), as well as non-interference conditions (e.g., 
{P N R} Ty + STORE(x, 1) {R}). In Sect.4, we present this generalization, and 
establish the soundness of RG principles independently of the memory model. 


Potential-Based Reasoning. The second contribution of our work is an appli- 
cation of the above to develop a logic for a potential-based operational semantics 
that captures SRA. In this semantics every memory state records sequences of 
store mappings (from shared variables to values) that each thread may observe. 
For example, assuming all variables are initialized to 0, if T4 executed its code 
until completion before Tz even started (so under SC the memory state is the 
store {x +> 1,y +> 1}), we may reach the SRA state in which T;’s potential con- 
sists of one store {x > 1,y +> 1}, and T2’s potential is the sequence of stores: 


({x > 0,y = 0}, {x ly 0}, {x= 1,y ]}), 


which captures the stores that Tə may observe in the order it may observe 
them. Naturally, potentials are lossy allowing threads to non-deterministically 
lose a subsequence of the current store sequence, so they can progress in their 
sequences. Thus, Tə can read 1 from y only after it loses the first two stores in 
its potential, and from this point on it can only read 1 from x. Now, one can see 
that all potentials of Tə at its initial program point are, in fact, subsequences of 
the above sequence (regardless of where T; is), and conclude that a = 1 >b = 1 
holds when Tə terminates. 

To capture the above informal reasoning in a Hoare logic, we designed a 
new form of assertions capturing possible locally observable sequences of stores, 
rather than one global store, which can be seen as a restricted fragment of linear 
temporal logic. The proof outline using these assertions is given in Fig. 2. In 
particular, [x = 1] is satisfied by all store sequences in which every store maps x 
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to 1, whereas [y Æ 1] ; [x = 1] is satisfied by all store sequences that can be split 
into a (possibly empty) prefix whose value for y is not 1 followed by a (possibly 
empty) suffix whose value for x is 1. Assertions of the form 7x TJ state that the 
potential of thread 7 includes only store sequences that satisfy I. 

The first assertion of Tz is implied by the initial condition, Tox [y 4 1], since 
the potential of the parent thread Ty is inherited by the forked child threads and 
T2 x |y 4 1] implies Tə x |y 4 1]; J for any I. Moreover, Tə x [|y 4 1]; [x = 1] 
is preserved by (i) line 1 because writing 1 to x leaves [y # 1] unchanged and 
re-establishes [x = 1]; and (ii) line 2 because the semantics for SRA ensures 
that after reading 1 from y by T2, the thread Tə is confined by T,’s potential 
just before it wrote 1 to y, which has to satisfy the precondition Tı x [x = 1]. 
(SRA allows to update the other threads’ potential only when the suffix of the 
potential after the update is observable by the writer thread.) 

In Sect.6 we formalize these arguments as Hoare rules for the primitive 
instructions, whose soundness is checked using the potential-based operational 
semantics and the interpretation of the assertion language. Finally, Piccolo is 
obtained by incorporating these Hoare rules in the general RG framework. 


Remark 2. Our presentation of the potential-based semantics for SRA (fully pre- 
sented in Sect. 5) deviates from the original one in [28], where it was called loSRA. 
The most crucial difference is that while loSRA’s potentials consist of lists of per- 
location read options, our potentials consist of lists of stores assigning a value 
to every variable. (This is similar in spirit to the adaptation of load buffers for 
TSO [4,5] to snapshot buffers in [2]). Additionally, unlike loSRA, we disallow empty 
potential lists, require that the potentials of the different threads agree on the very 
last value to each location, and handle read-modify-write (RMW) instructions dif- 
ferently. We employed these modifications to loSRA as we observed that direct rea- 
soning on loSRA states is rather unnatural and counterintuitive, as loSRA allows 
traces that blocka thread from reading any value from certain locations (which can- 
not happen in the version we formulate). For example, a direct interpretation of our 
assertions over loSRA states would allow states in which Tx [x = v] and 7x [x F v] 
both hold (when 7 does not have any option to read from x), while these assertions 
are naturally contradictory when interpreted on top of our modified SRA seman- 
tics. To establish confidence in the new potential-based semantics we have proved 
in Coq its equivalence to the standard execution-graph based semantics of SRA 
(over 5K lines of Coq proofs) [29]. 


3 Preliminaries: Syntax and Semantics 


In this section we describe the underlying program language, leaving the shared- 
memory semantics parametric. 


Syntax. The syntax of programs, given in Fig. 3, is mostly standard, comprising 
primitive (atomic) commands c and compound commands C. The non-standard 
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values v € Val = {0,1,...} shared variables x,y € Loc = {x,y,...} 
local registers r € Reg = {a,b,...} thread identifiers 7, € Tid = {To,T1,...} 


e= rl|vu|e+te|e=e|n7e|eAe|eVe| ... 
cu= r:=e | STORE(z,e) | r:=LOAD(x) | SWAP(x, e) En= (ce, 7 =e} 
C ::=c¢ | č | skip | C;C | ifethenCelseC | whileedoC | CIV C 


Fig. 3. Program syntax 


y =r => (e) l = W(x, y(e)) L=R(z,v) y =fr = 1] 
r:=e>7 Sy STORE(£,e) > y by r := LDAD( £) > y b 7 
c> y- 0 
l= RMW (x, v, y(e)) rı := e1 > Yo > Yi nets Tn t= en > Yn-1 zy Yn 
SWAP(x, e) >q 5 Y (c, (ri, ce Tn) = (e1, ae En) Y ie Yn 


Fig. 4. Small-step semantics of (instrumented) primitive commands (č >> 7 T q’) 


components are instrumented commands č, which are meant to atomically exe- 
cute a primitive command c and a (multiple) assignment r := e. Such instruc- 
tions are needed to support auxiliary (a.k.a. ghost) variables in RG proofs. In 
addition, SWAP (a.k.a. atomic exchange) is an example of an RMW instruction. 
For brevity, other standard RMW instructions, such as FADD and CAS, are omit- 
ted. 

Unlike many weak memory models that only support top-level parallelism, 
we include dynamic thread creation via commands of the form C4 “||” C2 that 
forks two threads named 7, and 72 that execute the commands Cı and Co, 
respectively. Each C; may itself comprise further parallel compositions. Since 
thread identifiers are explicit, we require commands to be well formed. Let Tid(C) 
be the set of all thread identifiers that appear in C. A command C is well 
formed, denoted wf(C), if parallel compositions inside employ disjoint sets of 
thread identifiers. This notion is formally defined by induction on the structure 
of commands, with the only interesting case being wf(C, “||”? C2) if wf(C1) A 
wf (C2) A T1 x T2 N Tid(C1) N Tid(C2) =ý, 


Program Semantics. We provide small-step operational semantics to com- 
mands independently of the memory system. To connect this semantics to a 
given memory system, its steps are instrumented with labels, as defined next. 


Definition 1. A label l takes one of the following forms: a read R(x, vr), a 
write W(x, vy), a read-modify-write RMW(a, Ug, Uy), a fork FORK(71, T2), or a join 
JOIN(T1, T2), where x € Loc, vg, vy € Val, and 7,72 € Tid. We denote by Lab the 
set of all labels. 
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E> 7 “2 7/ (C1, 7) “2+ (C17) 
(č, y) +5 (skip, 7’) (C1 ; C2, 7) = (C1 ; C27’) (skip ; C2, 7) = (C2,7) 
y(e) = true > i= 1 
y(e) £ true >i =2 C' = if e then (C ; while e do C) else skip 
(if e then Cı else C2, 7) + (Ci, Y) (while e do C, 7) 5 (C’, 7) 


Fig. 5. Small-step semantics of commands ((C, y) 45 (C’,7’)) 


C(r) = Ci |? Ce ca {TA H? Ca, 
Tı Z dom(C) T2 € dom(C) ~ [71 skip, T2 > skip 
(C, y) is (C",7') l = FORK(T1, T2) l = JOIN(71, T2) 
(Co w {r > C}, 7) C = {m > Ci, T + C2} C' = {r > skip} 
=s (Cow {r++ C'},7) (Csa) 4 (CYC, a) (Co WC, y) Z (Co wC',») 


Fig. 6. Small-step semantics of command pools ((C, y) 5 (C’,7’)) 


Definition 2. A register store is a mapping y : Reg — Val. Register stores are 
extended to expressions as expected. We denote by I the set of all register stores. 


The semantics of (instrumented) primitive commands is given in Fig. 4. Using 
this definition, the semantics of commands is given in Fig. 5. Its steps are of the 
form (C, y) 42 (C’,7’) where C and C’ are commands, y and 7’ are register 
stores, and le E€ LabU{e} (e denotes a thread internal step). We lift this semantics 
to command pools as follows. 


Definition 3. A command pool is a non-empty partial function C from thread 
identifiers to commands, such that the following hold: 


1. Tid(C(11)) A Tid(C(72)) = Ø for every 7 Æ T2 in dom(C). 

2. T Z Tid(C(7)) for every T € dom(C). 

We write command pools as sets of the form {71 + C),... ,Tn > Cn}. 
Steps for command pools are given in Fig. 6. They take the form (C, y) as 


(C’,y'), where C and C’ are command pools, y and 7’ are register stores, and 
(T : le} (with 7 € Tid and le € Lab U {e}) is a command transition label. 


Memory Semantics. To give semantics to programs under a memory model, 
we synchronize the transitions of a command C with a memory system. We 
leave the memory system parametric, and assume that it is represented by a 
labeled transition system (LTS) M with set of states denoted by M.Q, and 
steps denoted by —m . The transition labels of general memory system M 
consist of non-silent program transition labels (elements of Tid x Lab) and a 
(disjoint) set M.© of internal memory actions, which is again left parametric 
(used, e.g., for memory-internal propagation of values). 
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Example 1. The simple memory system that guarantees sequential consistency is 
denoted here by SC. This memory system tracks the most recent value written to 
each variable and has no internal transitions (SC.O = Ø). Formally, it is defined 


by SC.Q = Loc > Val and —sc is given by: 
l = RMW (x, vp, vw) 
l = R(x, vr) l = W(x, vw) m(x) = UR 
m(z)=tR m =m|r w] m =m|x = w] le {FORK(_,_),JOIN(_,_)} 


TL TL ' Tl j T 
m —sc M m —>sc M m —>sc m m —sc m 
The composition of a program with a general memory system is defined next. 


Definition 4. The concurrent system induced by a memory system M, denoted 
by M, is the LTS whose transition labels are the elements of (Tid x (LabU {e})) J 
M.O; states are triples of the form (C, y, m) where C is a command pool, y is a 
register store, and m € M.Q; and the transitions are “synchronized transitions” of 
the program and the memory system, using labels to decide what to synchronize 
on, formally given by: 


(C,7)  (C',7') 06MO 
l€ Lab m Zh m m (C,y) Z5 (C',7’) m bmm 


(Cym) Taq (Chm) (Gnn) Sag (Chm) (Cym) Sag (C7, m’) 


4 Generic Rely-Guarantee Reasoning 


In this section we present our generic RG framework. Rather than committing to 
a specific assertion language, our reasoning principles apply on the semantic level, 
using sets of states instead of syntactic assertions. The structure of proofs still 
follows program structure, thereby retaining RG’s compositionality. By doing 
so, we decouple the semantic insights of RG reasoning from a concrete syntax. 
Next, we present proof rules serving as blueprints for memory model specific 
proof systems. An instantiation of this blueprint requires lifting the semantic 
principles to syntactic ones. More specifically, it requires 


1. a language with (a) concrete assertions for specifying sets of states and (b) 
operators that match operations on sets of states (like A matches N); and 
2. sound Hoare triples for primitive commands. 


Thus, each instance of the framework (for a specific memory system) is left 
with the task of identifying useful abstractions on states, as well as a suitable 
formalism, for making the generic semantic framework into a proof system. 


RG Judgments. We let M be an arbitrary memory system and Xm £ Fx M.Q. 
Properties of programs C are stated via RG judgments: 


C sat m (P,R,G,Q) 
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where P,Q C Xm, R C P(X), and G is a set of guarded commands, each of 
which takes the form {G} T + a, where G C Xm and a is either an (instru- 
mented) primitive command č or a fork/join label (of the form FORK(7,, 72) or 
JOIN(T,,72)). The latter is needed for considering the effect of forks and joins on 
the memory state. 


Interpretation of RG Judgments. RG judgments C sat m (P,R,G,Q) state 
that a terminating run of C starting from a state in P, under any concurrent 
context whose transitions preserve each of the sets of states in R, will end in a 
state in Q and perform only transitions contained in G. To formally define this 
statement, following the standard model for RG, these judgments are interpreted 
on computations of programs. Computations arise from runs of the concurrent 
system (see Definition 4) by abstracting away from concrete transition labels and 
including arbitrary “environment transitions” representing steps of the concur- 
rent context. We have: 


— Component transitions of the form (C, y, m) = (C’,97’,m’). 

— Memory transitions, which correspond to internal memory steps (labeled with 
0 € M.O), of the form (C, y, m) = (C,y,m’). 

— Environment transitions of the form (C, y, m) 28% (C,7,m’). 


Note that memory transitions do not occur in the classical RG presentation 
(since SC does not have internal memory actions). 
A computation is a (potentially infinite) sequence 


f= (Co, Yo, Mo) a, (C1, 71; Mı) Ay as 


with a; € {cmp, env, mem}. We let (Clast(£), Yiast(¢)» Miast(¢)) denotes its last element, 
when € is finite. We say that € is a computation of a command pool C when Co = C 
and for every i > 0: 


— If a; = cmp, then (Ci, Yi, Mi) sila (Ci41, Yi+1, Mi+1) for some 7 € Tid and 
le € Lab U {e}. 


— If Qi = men, then (Ci, Yi, Mi) a (Cita, Vit1; Mi+1) for some 0 € M.O. 


We denote by Comp(C) the set of all computations of a command pool C. 
To define validity of RG judgments, we use the following definition. 


Definition 5. Let € = (Co, y0, Mmo) > (C1, %1; M1) “> ... be a computation, 
and C sat m (P,R,G,Q) an RG-judgment. 


— € admits P if (yo, mo) € P. 
— € admits R if (yı, mi) E€ R > (yi41,mMi+1) E€ R for every R € R andi > 0 
with aj41 = env. 
— € admits G if for every i > 0 with ai+ı = cmp and (yi, Mi) Æ (Yi41, Mi41) 
there exists {P} T+ a € G such that (yi, mi} € P and 
e if a = Č is an instrumented primitive command, then for some le € 


Lab U {e}, we have ({r > Č}, Yi, Mi) the {r $ skip}, Yi+1, Mi+1) 
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SKIP SEM S - 
ME {P} r+ é{Q} 


{7 > skip} sat, (P, {P},0, P) {7 = č} saty (P, {P,Q}, {{P} 7 > č}, Q) 


{r = Ci} sat m (P, Ri, Gi, R) {r = Co} sat m (R, R2,62,Q) 
{r = C1; Co} sat m (P, Ri U R2, Gi U G2, Q) 


SEQ 


{H 01} satu (PO Jel, RisGi,Q) {r+ Ca} satyy (P\ fel, Re, G2, Q) 
{T = if e then Cı else C2} sat mq (P, Rı U R2 U {P}, G1 U G2, Q) 


P\ fe] CQ {T = C} sat m (PN [e], R,G, P) 
{T ++ while e do C} Sat m (P,RU {P, Q},G,Q) 


WHILE 


{71 Ci} sat mq (Pi, Ri, G1, Q1) {T2 = C2} sat m (P2, R2, G2, Q2) 
PCRAOAR QINQ: CQ (Rı,Gı) and (R2, G2) are non-interfering 


i [m Ci} W {m > Ca} satn (P, Ri URa U4P, Q}, G1 UGs, Q) 
M E {P} 7 +> FORK(11, 72) {P’} M E {Q} T = JOIN(1, T2) {Q} 
{r b> Ci} wW {r2 > C2} sat m (P',R,G,Q') 
FORIGIOIN g' = GU {{P} T FORK(T1, 72), {Q’} T = JOIN(T1, T2)} 


{7 = C1 |"? Co} satu (P, RU {P,Q}, g, Q) 


Fig. 7. Generic sequential RG proof rules (letting [e] = {(y,m) | y(e) = true}) 


eifae {FORK(71, 72), JOIN(71, T2)}, then m; aM Mipi and Vi = Yi+1- 
- € admits Q if (Mast(£), Mast(e)) E Q whenever € is finite and Chast) (T) = skip 
for every T € dom(Cast(e))- 


We denote by Assume(P, R) the set of all computations that admit P and R, 
and by Commit(G, Q) the set of all computations that admit G and Q. 


Then, validity of a judgment if defined as 


EC satm (P,R,G,Q) & Comp(C) N Assume(P, R) C Commit(G, Q) 


Memory Triples. Our proof rules build on memory triples, which specify pre- 
and postconditions for primitive commands for a memory system M. 


Definition 6. A memory triple for a memory system M is a tuple of the form 
{P} rH a {Q}, where P,Q C Xm, T € Tid, and a is either an instrumented 
primitive command, a fork label, or a join label. A memory triple for M is valid, 
denoted by M F {P} TH a {Q}, if the following hold for every (y, m) € P, 
y €F and m € MQ: 


: è è eres Tle 
— if a is an instrumented primitive command and ({{r = a}, y, m) —>7q 


({7 +> skip}, y’, m’) for some le € Lab U {e}, then (7,m’) € Q. 
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— If a € {FORK(71, T2), JOIN(71, 72)} and m ZS m m’, then (y, m’) E Q. 


Example 2. For the memory system SC introduced in Example 1, we have, e.g., 
memory triples of the form SC F {e(r := x)} TH r := LOAD(x) {e} (where 
e(r := x) is the expression e with all occurrences of r replaced by 2). 


RG Proof Rules. We aim at proof rules deriving valid RG judgments. Figure 7 
lists (semantic) proof rules based on externally provided memory triples. These 
rules basically follows RG reasoning for sequential consistency. For example, rule 
SEQ states that RG judgments of commands C1 and C2 can be combined when 
the postcondition of C; and the precondition of C2 agree, thereby uniting their 
relies and guarantees. Rule COM builds on memory triples. The rule PAR for 
parallel composition combines judgments for two components when their relies 
and guarantees are non-interfering. Intuitively speaking, this means that each 
of the assertions that each thread relied on for establishing its proof is preserved 
when applying any of the assignments collected in the guarantee set of the other 
thread. An example of non-interfering rely-guarantee pairs is given in step 7 in 
Sect. 2. Formally, non-interference is defined as follows: 


Definition 7. Rely-guarantee pairs (R1,G1) and (R2, G2) are non-interfering if 
MF {ROP} rH a {R} holds for every R € Ri and {P} T+ a € Ga, and 
similarly for every R € Rə and {P} T > a E€ G1. 


In turn, FORK-JOIN combines the proof of a parallel composition with proofs 
of fork and join steps (which may also affect the memory state). Note that the 
guarantees also involve guarded commands with FORK and JOIN labels. 

Additional rules for consequence and introduction of auxiliary variables are 
elided here (they are similar to their SC counterparts), and provided in the 
extended version of this paper [30]. 


Soundness. To establish soundness of the above system we need an additional 
requirement regarding the internal memory transitions (for SC this closure vac- 
uously holds as there are no such transitions). We require all relies in R to be 
stable under internal memory transitions, i.e. for R € R we require 


Vy,m,m',0 €E M.©.m &m m > ((y,m) € R= (y, m’) € R) (mem) 


This condition is needed since the memory system can non-deterministically 
take its internal steps, and the component’s proof has to be stable under such 
steps. 


Theorem 1 (Soundness). FC satm (P,R,G,Q) = FC satum (P,R,G,Q). 


With this requirement, we are able to establish soundness. The proof, which 
generally follows [48] is given in the extended version of this paper [30]. We 
write F C satm (P,R,G,Q) for provability of a judgment using the semantic 
rules presented above. 
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5 Potential-Based Memory System for SRA 


In this section we present the potential-based semantics for Strong Release- 
Acquire (SRA), for which we develop a novel RG logic. Our semantics is based 
on the one in [27,28], with certain adaptations to make it better suited for 
Hoare-style reasoning (see Remark 2). 

In weak memory models, threads typically have different views of the shared 
memory. In SRA, we refer to a memory snapshot that a thread may observe as 
a potential store: 


Definition 8. A potential store is a function 6 : Loc — Val x {R, RMW} x Tid. We 
write val(d(x)), rmw(ô(x)), and tid(d(«)) to retrieve the different components 
of ôĉ(x). We denote by A the set of all potential stores. 


Having ô(x) = (v,R,7) allows to read the value v from x (and further 
ascribes that this read reads from a write performed by thread 7, which is 
technically needed to properly characterize the SRA model). In turn, having 
(x) = (v,RMW,7) further allows to perform an RMW instruction that atomi- 
cally reads and modifies z. 

Potential stores are collected in potential store lists describing the values 
which can (potentially) be read and in what order. 


Notation 9. Lists over an alphabet A are written as L = a1 -...- an where 
@1,.--,dn E€ A. We also use - to concatenate lists, and write L/i] for the ith 
element of L and |L| for the length of L. 


A (potential) store list is a finite sequence of potential stores ascribing a 
possible sequence of stores that a thread can observe, in the order it will observe 
them. The RMW-flags in these lists have to satisfy certain conditions: once the 
flag for a location is set, it remains set in the rest of the list; and the flag must 
be set at the end of the list. Formally, store lists are defined as follows. 


Definition 10. A store list L € L is a non-empty finite sequence of potential 
stores with monotone RM W-flags ending with an RMV, that is: for all x € Loc, 


1. if rmw(L[?](x)) = RMV, then rmw(L[j](x)) = RMW for every i < j < |L], and 
2. rmw(L||L||(a)) = RMW. 


Now, SRA states (SRA.Q) consist of potential mappings that assign potentials 
to threads as defined next. 


Definition 11. A potential D is a non-empty set of potential store lists. A 
potential mapping is a function D : Tid — P(L)\{O} that maps thread identifiers 
to potentials such that all lists agree on the very final potential store (that is: 
Ly{|L£4|] = Le[|L2|] whenever Lı € D(7) and Lz E€ D(T2)). 


These potential mappings are “lossy” meaning that potential stores can be 
arbitrarily dropped. In particular, dropping the first store in a list enables reading 
from the second. This is formally done by transitioning from a state D to a 
“smaller” state D’ as defined next. 
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WRITE 
YL’ € D'(r). IL € D(T). L’ = L|x + (uw, RMW, 7)] 
Yr € dom(D) \ {7}, L’ € D' (r). 3Lo, La. 


Lo- La E D(T) A La E D(T)A LOSE DUP 
L' = Loje = R] - Lila > (vu, RMW, 7)] DED DaD: 
D T,W(a,0w) SRR D D Š sRA D' D sora D' 
READ RMW 
Jr. YL € D(r). val(L[1](x)) = vr A VL € D(T). rmu(L[1](x)) = RMW 
ia(L[1](@)) = 7 paean oo 
D T,R(£, UR) SRA D D T,RMW (T, UR, UW) SRA D 
FORK JOIN 
Drew = {71 + D(T), 2 => D(T)} Dnew = {T => D(m1) N D(72)} 
Ds D| dom (D)\{r} W Dnew Di D|aom (D)\{71,72} W Dnew 
D 7, FORK(T 1,72) RA D D T,JOIN(T1,T2) BA D 


Fig. 8. Steps of SRA (defining 6[x +> (v, u,7)](y) = (v, u, T} if y = x and 6(y) else, and 
ô[x ++ R] to set all RMW-flags for x to R; both pointwise lifted to lists) 


Definition 12. The (overloaded) partial order E is defined as follows: 


1. on potential store lists: L’ E L if L’ is a nonempty subsequence of L; 
2. on potentials: D' © D if YL’ € D’. 3L € D. L'E L; 
3. on potential mappings: D’ E D if D’(r) E D(r) for every T € dom(D). 


We also define L < L’ if L’ is obtained from L by duplication of some stores 
(e.g., 01-62-63 < ô1- ô2-ô2- 83). This is lifted to potential mappings as expected. 

Figure 8 defines the transitions of SRA. The LOSE and DUP steps account for 
losing and duplication in potentials. Note that these are both internal memory 
transitions (required to preserve relies as of (mem)). The FORK and JOIN steps 
distribute potentials on forked threads and join them at the end. The READ 
step obtains its value from the first store in the lists of the potential of the 
reader, provided that all these lists agree on that value and the writer thread 
identifier. RMW steps atomically perform a read and a write step where the read 
is restricted to an RMW-marked entry. 

Most of the complexity is left for the WRITE step. It updates to the new 
written value for the writer thread 7. For every other thread, it updates a suffix 
(Lı) of the store list with the new value. For guaranteeing causal consistency 
this updated suffix cannot be arbitrary: it has to be in the potential of the writer 
thread (Lı € D(r)). This is the key to achieving the “shared-memory causality 
principle” of [28], which ensures causal consistency. 


Example 3. Consider again the MP program from Fig. 2. After the initial fork 
step, threads Tı and Tz may have the following store list in their potentials: 


en (0, RMW, To) xt> (0, RMV, To) x +> (0, RMV, To) 
~ Ly + (0,RMW,To)| Ly +> (0,RMW,To)} Ly +> (0, RMW, To) | 
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Then, STORE(x, 1) by Tı can generate the following store list for To: 


L- es (0, R, To) xt (1, RMW,T1) xt> (1,RMW, T1) 
2 [y (0, RMW, To)] Ly (0, RMW, To)| Ly (0, RMW, To) | 


Thus Tz keeps the possibility of reading the “old” value of x. For T, this is 
different: the model allows the writing thread to only see its new value of x and 
all entries for x in the store list are updated. Thus, for Tı we obtain store list 


L- [> (1, RMW, T1)] [x — (1,RMW,T1)] [xt (1,RMW,T1) 
t [y (0, RMW, To)]| Ly (0, RMW, To)] Ly +> (0, RMW, To) | 


Next, when Tı executes STORE(y, 1), again, the value for y has to be updated to 
1 in T, yielding 


, _ [x= (1, RMW,To)| [xt (1,RMW,T))] [x = (1, RMW, T1) 
1 [y= (1,RMW,T1)] Ly (1,RMW,T1)} Ly (1, RMW,T,)] 


For Tə the write step may change Lz to 


TENNE dam (0,R,To)] [x (1,RMW,T1)] [x (1,RMV, T1) 
2 Ly (0,R,To)} | y= (0,R,To) | [y (L, RMW, T1) 


Thus, thread Tə can still see the old values, or lose the prefix of its list and see 
the new values. Importantly, it cannot read 1 from y and then 0 from x. Note 
that STORE(y, 1) by Tı cannot modify Lo to the list 


w|" (0,R, To) x (1,RMW, T1)| [x — (1,RMW, T1) 
2 [y= (1,RMW,T1)| |y= (1,RMW,T1)} Ly (1,RMW, T1) 


as it requires Tı to have Lə in its own potential. This models the intended 
semantics of message passing under causal consistency. 


The next theorem establishes the equivalence of SRA as defined above and 
opSRA from [28], which is an (operational version of) the standard strong release- 
acquire declarative semantics [26,31]. (As a corollary, we obtain the equivalence 
between the potential-based system from [28] and the variant we define in this 
paper.) 

Our notion of equivalence employed in the theorem is trace equivalence. We 
let a trace of a memory system be a sequence of transition labels, ignoring 
£ transitions, and consider traces of SRA starting from an initial state AT € 
{Ti,..., Tw}. { (Ax. (0, RMW, To))} and traces of opSRA starting from the initial 
execution graph that consists of a write event to every location writing 0 by a 
distinguished initialization thread To. 


Theorem 2. A trace is generated by SRA iff it is generated by opSRA. 


The proof is of this theorem is by simulation arguments (forward simulation 
in one direction and backward for the converse). It is mechanized in Coq [29]. 
The mechanized proof does not consider fork and join steps, but they can be 
straightforwardly added. 


220 O. Lahav et al. 


extended expressions E ::=e | x | R(x) | E+E | AE | EAE | ... 
interval assertions I x=[E] |Z; | TAIT | IVI 
assertions pwu=tKI | e| p^ | yẹ 


Fig. 9. Assertions of Piccolo 


6 Program Logic 


For the instantiation of our RG framework to SRA, we next (1) introduce the 
assertions of the logic Piccolo and (2) specify memory triples for Piccolo. Our logic 
is inspired by interval logics like Moszkowski’s ITL [35] or duration calculus [13]. 


Syntax and Semantics. Figure 9 gives the grammar of Piccolo. We base it on 
extended expressions which—besides registers—can also involve locations as well 
as expressions of the form R(x) (to indicate RMW-flag R). Extended expressions 
E can hold on entire intervals of a store list (denoted [E]). Store lists can be split 
into intervals satisfying different interval expressions (Jy ; ...; In) using the “;” 
operator (called “chop”). In turn, 7x J means that all store lists in r’s potential 
satisfy I. For an assertion y, we let fu(y) C RegULocUTid be the set of registers, 
locations and thread identifiers occurring in y, and write R(x) € » to indicate 
that the term R(x) occurs in y. 

As an example consider again MP (Fig. 2). We would like to express that T2 
upon seeing y to be 1 cannot see the old value 0 of x anymore. In Piccolo this 
is expressed as T2 X |y Æ 1]; [x = 1]: the store lists of Tə can be split into two 
intervals (one possibly empty), the first satisfying y Æ 1 and the second x = 1. 

Formally, an assertion y describes register stores coupled with SRA states: 


Definition 13. Let y be a register store, 6 a potential store, L a store list, and 
D a potential mapping. We let [e] :,5) = y(e), lz], = ôx), and [R(7)] (4,5) = 
if rmw(ô(x)) = R then true else false. The extension of this notation to any 
extended expression Æ is standard. The validity of assertions in (y, D), denoted 
by (y, D) E 4, is defined as follows: 


1. (y, L} = [E] if [E] (4,5) = true for every ô € L. 
2. (y,L) H h; kh if (y, Li) H hi and (y, L2) H Io for some (possibly empty) Lı 
and Lə such that L = Li - Lə. 


3. (y,L) ELA Ia if (y, L) H L and (y, L} H h (similarly for V). 

4. (y,D) H rx if (y, L) H T for every L € D(r). 

5. (y,D) E e if y(e) = true. 

6. (y,D) E y1 A g2 if (y, D) H gi and (y, D) H po (similarly for V). 


Note that with ^ and V as well as negation on expressions, ! the logic provides 
the operators on sets of states necessary for an instantiation of our RG frame- 
work. Further, the requirements from SRA states guarantee certain properties: 


1 Negation just occurs on the level of simple expressions e which is sufficient for cal- 
culating P \ [e] required in rules IF and WHILE. 


Rely-Guarantee Reasoning for Causally Consistent Shared Memory 221 


Assumption Pre Command Post Reference 

{y(r:=e)} r=r:=e {y} SUBST-ASGN 
x é folo) To} T ++ WRITE(z, e) ) {yp} STABLE-WR 
r é fole) {or r := LOAD(x){ p} STABLE-LD 
tT € fole) {o} T= FORK(71, 72) {9} STABLE-FORK 
T É fu(y) {y} 7+ JOIN(Ti, 72) {yp} STABLE-JOIN 

{eATKI} 7 ++ FORK(T1, To) {eA TIKI At2« I} FORK 

{eAT KI At2KI} T JOIN(T, T2) {eATKI} JOIN 
{True} T > WRITE(z, e) {7 x [x = e]} WR-OWN 

R(x) ¢ I {rxI} T> WITE(x,e) {ax (I A [R(x)]) ; [x = e]}| WR-OTHER-1 
y {rt Ip ATK] ; Ir} T WITE(x,e) {7x I; I} WR-OTHER-2 
x é fu(I,) a } T WRITE(« 9 {rx [R(x)] ; I-} WR-OTHER-3 
x ¢ fo(1) Trx [R(x)] ; I} T= SWAP(z,e) {rxI} SWAP-SKIP 


Fig. 10. Memory triples for Piccolo using WRITE € {SWAP, STORE} and assuming 7 4 7 


— For 91 = Tx [E{] ;...; [E7] and ge = m x [ET]; ; [Em]: if ET AEF => False 
for alll <i < nand 1< j < m, then Y1 A y2 => False (follows from the fact 
that all lists in potentials are non-empty and agree on the last store). 

- If (y, D) H rx [R(x)] ; [E], then every list L € D(r) contains a non-empty 
suffix satisfying F (since all lists have to end with RMW-flags set on). 


All assertions are preserved by steps LOSE and DUP. This stability is required 
by our RG framework (Condition (mem))?. Stability is achieved here because 
negations occur on the level of (simple) expressions only (e.g., we cannot have 
(7 « [a = v]), meaning that 7 must have a store in its potential whose value for 
x is not v, which would not be stable under LOSE). 


Proposition 1. If (y, D) = y and D &sra D’, then (y, D’) E ọ. 


Memory Triples. Assertions in Piccolo describe sets of states, thus can be used 
to formulate memory triples. Figure 10 gives the base triples for the different 
primitive instructions. 

We see the standard SC rule of assignment (SUBST-ASGN) for registers fol- 
lowed by a number of stability rules detailing when assertions are not affected 
by instructions. Axioms FORK and JOIN describe the transfer of properties from 
forking thread to forked threads and back. 

The next four axioms in the table concern write instructions (either SWAP or 
STORE). They reflect the semantics of writing in SRA: (1) In the writer thread 7 
all stores in all lists get updated (axiom WR-OwN). Other threads 7 will have 
(2) their lists being split into “old” values for x with R flag and the new value 
for x (WR-OTHER-1), (3) properties (expressed as [,) of suffixes of lists being 
preserved when the writing thread satisfies the same properties (WR-OTHER-2) 
and (4) their lists consisting of R-accesses to x followed by properties of the 


? Such stability requirements are also common to other reasoning techniques for weak 
memory models, e.g., [19]. 
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writer (WR-OTHER-3). The last axiom concerns SWAP only: as it can only read 
from store entries marked as RMW it discards intervals satisfying [R(x)]. 


Example 4. We employ the axioms for showing one proof step for MP, namely 
one pair in the non-interference check of the rely Rə of Tz with respect to the 
guarantees G, of T,: 


{T2™ [y Æ 1] ; [x = 1] ATi x [x = 1]} Ty — STORE(x, 1) {T2 x [y # 1]; [x = 1]} 


By taking I, to be [x = 1], this is an instance of WR-OTHER-2. 
In addition to the axioms above, we use a shift rule for load instructions: 


{rx I} T= r := LOAD(x) {4Y} 


LD-SHIFT {rx[(e A E)(r := x)| ; I} T| r := LOAD( x) {(e A Tx [E]; 1) V Y} 


A load instruction reads from the first store in the lists, however, if the list 
satisfying |(e A E)(r := x)| in [(e A E) (r := x)] ; I is empty, it reads from a list 
satisfying I. The shift rule for LOAD puts this shifting to next stores into a proof 
rule. Like the standard Hoare rule SUBST-ASGN, LD-SHIFT employs backward 
substitution. 


Example 5. We exemplify rule LD-SHIFT on another proof step of example MP, 
one for local correctness of T9: 


{Tox [y 4 1]; [kx = 1]} To H a:= LOAD(y) {a= 1 => Tox [x= 1]} 


From axiom STABLE-LD we get {T2 x |x = 1]} Tə > a := LOAD(y) {T2 x |x = lj}. 
We obtain {T2 [y Æ 1] ; [x = 1]} Tə  a:= LOAD(y) {a 41 V T2 x [x = 1]} using 
the former as premise forLD-SHIFT. 


In addition, we include the standard conjunction, disjunction and conse- 
quence rules of Hoare logic. For instrumented primitive commands we employ 
the following rule: 


{yo} TH c {p1 {41} TH r1 := e1 {do} {Yn-1} TH Tn := en {Yn} 
{yo} T= (6, (ri, Tn) = (€1,..-,€n)) {Wn} 


INSTR 


Finally, it can be shown that all triples derivable from axioms and rules are 
valid memory triples. 


Lemma 1. Ifa Piccolo memory triple is derivable, piccolo 19} T =œ a {4}, then 
SRA F {{(7,D) | (7, D) E Yh T= a {{(7, D) | 0, D) E Yh 
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{To K [x Æ 2]} 
Thread Tı Thread T2 
{Tix Ij} {T2 x Ið} 


1 : STORE(x, 1); || 3 : a := LOAD(x); 

{Tix If} {a= 2 = To I} 

2: STORE(x,2) || 4: b := LOAD(x) 

{ True} {a=2>b=2} 
{a=2>b #1} 
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Fig. 11. RRC for two threads (a.k.a. CoRRO) 


{To x Iò} 

Thread Tı Thread Tə Thread Ts Thread T4 
{Ti K Ið2} {Ti x Ið} {Ts x (Iði2 V Ið21)} {Ta x (Iia V Td21)} 
1 : STORE(x, 1) || 2 : STORE(x, 2) || 3 : a := LOAD (x); 5: c := LOAD(x); 
{True} {True} {a = 2 > T3 x I3 } {c = 1 > Tax Iia} 

4 : b := LOAD (x) 6 : d := LOAD(x) 

{(a, b) = (2,1) = Tax If} a(S 

T4 X B 
{(a,b) = (2,1) = (c,d) # (1, 2)} 
Fig. 12. RRC for four threads (a.k.a. CoRR2) 

7 Examples 


We discuss examples verified in Piccolo. Additional examples can be found in 
the extended version of this paper [30]. 


Coherence. We provide two coherence examples in Figs. 11 and 12, using the 
notation Ig v.u, = [e = vi]; [£ = va] ;...; [e = vn]. Figure11 enforces an 
ordering on writes to the shared location x on thread Tı. The postcondition 
guarantees that after reading the second write, thread Tz cannot read from the 
first. Figure 12 is similar, but the writes to x occur on two different threads. The 
postcondition of the program guarantees that the two different threads agree on 
the order of the writes. In particular if one reading thread (here T3) sees the 
value 2 then 1, it is impossible for the other reading thread (here T4) to see 1 
then 2. 

Potential assertions provide a compact and intuitive mechanism for reason- 
ing, e.g., in Fig.11, the precondition of line 3 precisely expresses the order of 
values available to thread Tj. This presents an improvement over view-based 
assertions [16], which required a separate set of assertions to encode write order. 


Peterson’s Algorithm. Figure 13 shows Peterson’s algorithm for implement- 
ing mutual exclusion for two threads [38] together with Piccolo assertions. We 
depict only the code of thread Tı. Thread Tz is symmetric. A third thread Ts; 
is assumed stopping the other two threads at an arbitrary point in time. We 
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Thread Tı 

{rat A mag \ mx, = 0} 

while =stop do {7a A (~az V Tı X [R(turn)] ; [flag,]) } 

1: STORE(flag,, true); {7a1 A Tı x [flag,] A (~a2 V Tı x [R(turn)] ; [flag,]) } 

(SWAP(turn, 2); ai := true); 

do {a1 A (naz V Tı X [flag, A turn 4 1] v P)} 
£11 := LOAD(flag,); {ai A (a2 V (£11 A Tı X [flag, A turn # 1]) V P)} 
tu, := LOAD(turn); {a1 A (“a2 V (£11 A tu, 41 ATi X [flag, A turn#1]) v P)} 

until >fl; V (tui = 1); {a1 A (a2 V P)} 

STORE(cs, L); {ai A (naz V P)} 

STORE(cs, 0); {T1 x [cs = 0] A ai A (~az V P)} 

mx; := LOAD(cs); {mx, =0Aa, A (maz V P)} 

0: (STORE(flag,,0); ai := false) 


{mx = 0} 


Fig. 13. Peterson’s algorithm, where P = Tı x[R(turn)] ; [flag, ^ turn = 1]. Thread T2 
is symmetric and we assume a stopper thread T3 that sets stop to true. 


noonoo AVN 


use do C until e as a shorthand for C ; while e do C. For correctness under 
SRA, all accesses to the shared variable turn are via a SWAP, which ensures that 
turn behaves like an SC variable. 

Correctness is encoded via registers mx, and mx2 into which the contents of 
shared variable cs is loaded. Mutual exclusion should guarantee both registers 
to be 0. Thus neither threads should ever be able to read cs to be L (as stored 
in line 7). The proof (like the associated SC proof in [9]) introduces auxiliary 
variables a; and ag. Variable a; is initially false, set to true when a thread T; 
has performed its swap, and back to false when T; completes. 

Once again potentials provide convenient mechanisms for reasoning about the 
interactions between the two threads. For example, the assertion T: x [R(turn)] ; 
[flag.] in the precondition of line 2 encapsulates the idea that an RMW on 
turn (via SWAP(turn, 2)) must read from a state in which flag, holds, allowing 
us to establish Tı x[flag,] as a postcondition (using the axiom SWAP-SKIP). We 
obtain disjunct T; x [flag, ^ turn Æ 1] after additionally applying WR-own. 


8 Discussion, Related and Future Work 


Previous RG-like logics provided ad-hoc solutions for other concrete mem- 
ory models such as x86-TSO and C/C++11 [11,16,17,32,39,40,47]. These 
approaches established soundness of the proposed logic with an ad-hoc proof 
that couples together memory and thread transitions. We believe that these log- 
ics can be formulated in our proposed general RG framework (which will require 
extensions to other memory operations such as fences). 

Moreover, Owicki-Gries logics for different fragments of the C11 memory 
model [16,17,47] used specialized assertions over the underlying view-based 
semantics. These include conditional-view assertion (enabling reasoning about 
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MP), and value-order (enabling reasoning about coherence). Both types of asser- 
tions are special cases of the potential-based assertions of Piccolo. 

Ridge [40] presents an RG reasoning technique tailored to x86-TSO, treating 
the write buffers in TSO architectures as threads whose steps have to preserve 
relies. This is similar to our notion of stability of relies under internal memory 
transitions. Ridge moreover allows to have memory-model specific assertions 
(e.g., on the contents of write buffers). 

The OGRA logic [32] for Release-Acquire (which is slightly weaker form of 
causal consistency compared to SRA studied in this paper) takes a different 
approach, which cannot be directly handled in our framework. It employs sim- 
ple SC-like assertions at the price of having a non-standard non-interference 
condition which require a stronger form of stability. 

Coughlin et al. [14,15] provide an RG reasoning technique for weak memory 
models with a semantics defined in terms of reordering relations (on instructions). 
They study both multicopy and non-multicopy atomic architectures, but in all 
models, the rely-guarantee assertions are interpreted over SC. 

Schellhorn et al. [41] develop a framework that extends ITL with a composi- 
tional interleaving operator, enabling proof decomposition using RG rules. Each 
interval represents a sequence of states, strictly alternating between program 
and environment actions (which may be a skip action). This work is radically 
different from ours since (1) their states are interpreted using a standard SC 
semantics, and (2) their intervals represent an entire execution of a command as 
well the interference from the environment while executing that command. 

Under SC, rely-guarantee was combined with separation logic [44,46], which 
allows the powerful synergy of reasoning using stable invariants (as in rely- 
guarantee) and ownership transfer (as in concurrent separation logic). It is inter- 
esting to study a combination of our RG framework with concurrent separation 
logics for weak memory models, such as [43,45]. 

Other works have studied the decidability of verification for causal consis- 
tency models. In work preceding the potential-based SRA model [28], Abdulla 
et al. [1] show that verification under RA is undecidable. In other work, Abdulla 
et al. [3] show that the reachability problem under TSO remains decidable for 
systems with dynamic thread creation. Investigating this question under SRA is 
an interesting topic for future work. 

Finally, the spirit of our generic approach is similar to Iris [22], Views [18], 
Ogre and Pythia [7], the work of Ponce de León et al. [34], and recent axiomatic 
characterizations of weak memory reasoning [19], which all aim to provide a 
generic framework that can be instantiated to underlying semantics. 

In the future we are interested in automating the reasoning in Piccolo, starting 
from automatically checking for validity of program derivations (using, e.g., SMT 
solvers for specialised theories of sequences or strings [24,42]), and, including, 
more ambitiously, synthesizing appropriate Piccolo invariants. 
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Abstract. Existing dynamic partial order reduction (DPOR) algo- 
rithms scale poorly on concurrent data structure benchmarks because 
they visit a huge number of blocked executions due to spinloops. 

In response, we develop AWAMOCHE, a sound, complete, and strongly 
optimal DPOR algorithm that avoids exploring any useless blocked exe- 
cutions in programs with await and confirmation-CAS loops. Conse- 
quently, it outperforms the state-of-the-art, often by an exponential fac- 
tor. 


1 Introduction 


Dynamic partial order reduction (DPOR) [13] has been promoted as an effective 
verification technique for concurrent programs: starting from a single execution 
of the program under test, DPOR repeatedly reverses the order of conflicting 
accesses in order to generate all (meaningfully) different program executions. 

Applying DPOR in practice, however, reveals a major performance and scal- 
ability bottleneck: it explores a huge number of blocked executions, often out- 
numbering the complete program executions by an exponential factor. Blocked 
executions most commonly occur in programs with spinloops, i.e., loops that do 
not make progress unless some condition holds. Such loops are usually trans- 
formed into assume statements [14,18], effectively requiring that the loop exits 
at its first iteration (and blocking otherwise). 

We distinguish three classes of such blocked executions. 

The first class occurs in programs with non-terminating spinloops, such as 
a program awaiting for x > 42 in a context where x = 0. For this program, 
modeled as the statement assume(x > 42), DPOR obviously explores a blocked 
execution as the only existing value for x violates the assume condition. Such 
blocked executions should be explored because they indicate program errors. 

The second class occurs in programs with await loops. To see how such loops 
lead to blocked executions, consider the following program under sequential con- 
sistency (SC) [23] (initially «= y=0), 


y:=2 
gz:=2 
< 
assume(y < 1) re 21) 


© The Author(s) 2023 
C. Enea and A. Lal (Eds.): CAV 2023, LNCS 13964, pp. 230-250, 2023. 
https://doi.org/10.1007/978-3-031-37706-8_12 
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where each assume models an await loop, e.g., do a := y while (a > 1) for the 
assume of the first thread. Suppose that DPOR executes this program in a left-to- 
right manner, thereby generating the interleaving x := 2, assume(y < 1), y := 2. 
At this point, assume(x < 1) cannot be executed, since x would read 2. Yet, 
DPOR cannot simply abort the exploration. To generate the interleaving where 
the first thread reads y = 1, DPOR must consider the case where the read of x 
is executed before the x := 2 assignment. In other words, DPOR has to explore 
blocked executions in order to generate non-blocked ones. 
The third class occurs in programs with confirmation-CAS loops such as: 


do 
a:=2 


(a) which is modeled as: b:= f(a) 


while (=CAS(z, a, b)) assume(CAS (sr, a, b)) 


Consider a program comprising two threads running the code above, with a and 
b being local variables. Suppose that DPOR, first obtains the (blocked) trace 
where both threads concurrently try to perform their CAS: a, := g£, a2 := 7, 
CAS (a, a1, bi), CAS(a, a2, b2). Trying to satisfy the blocked assume of thread 2 by 
reversing the CAS instructions is fruitless because then thread 1 will be blocked. 

In this paper, we show that exploring blocked executions of the second and 
third classes is unnecessary. 

We develop AWAMOCHE, a sound, complete, and optimal DPOR algorithm 
that avoids generating any blocked executions for programs with await and 
confirmation-CAS loops. Our algorithm is strongly optimal in that no explo- 
ration is wasted: it either yields a complete execution or a termination violation. 
AWAMOCHE extends TruSt [15], an optimal DPOR algorithm that supports weak 
memory models and has polynomial space requirements, with three new ideas: 


1. AWAMOCHE identifies certain reads as stale, meaning that they will never be 
affected by a race reversal due to TruSt’s maximality condition on reversals, 
and avoids exploring any executions that block on stale-read values. 

2. To deal with await loops, since it cannot completely avoid generating execu- 
tions with blocking reads, AWAMOCHE revisits such executions in place if a 
same-location write is later encountered. If no such write is found, then the 
blocked execution witnesses a program termination bug [21,25]. 

3. To effectively deal with confirmation-CAS loops, AWAMOCHE only consid- 
ers executions where the confirmation succeeds, by reversing not only races 
between conflicting instructions, but also speculatively revisiting traces with 
two reads reading from the same write event to enable a later in-place revisit. 


As we shall see in Sect. 5, supporting these DPOR modifications is by no means 
trivial when it comes to proving correctness and (strong) optimality. Indeed, 
TruSt’s correctness proof proceeds in a backward manner, assuming a way to 
determine the last event that was added to a given trace. The presence of in-place 
and speculative revisits, however, makes this impossible. 
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We therefore develop a completely different proof that works in a forward 
manner: from each configuration that is a prefix of a complete trace, we construct 
a sequence of steps that will lead to a larger configuration that is also a prefix 
of the trace. Our proof assumes that same-location writes are causally ordered, 
which invariably holds in correct data structure benchmarks, but is otherwise 
more general than TruSt’s assuming less about the underlying memory model. 

Our contributions can be summarized as follows: 


Section 2 We describe how and why DPOR encounters blocked executions. 

Section 3 We intuitively present AWAMOCHE’s three novel key ideas: stale reads, 
in-place revisits, and speculative revisits. 

Section 4 We describe our algorithm in detail in a memory-model-agnostic frame- 
work. 

Section 5 We generalize TruSt’s proof and prove AWAMOCHE sound, complete, 
and strongly optimal. 

Section 6 We evaluate AWAMOCHE, and demonstrate that it outperforms the 
state-of-the-art, often by an exponential factor. 


2 DPOR and Blocked Executions 


Before presenting AWAMOCHE, we recall the fundamentals of DPOR (Sect. 2.1), 
and explain why spinloops lead to blocked explorations (Sect. 2.2). 


2.1 Dynamic Partial Order Reduction 


DPOR algorithms verify a concurrent program by enumerating a representa- 
tive subset of its interleavings. Specifically, they partition the interleavings into 
equivalence classes (two interleavings are equivalent if one can be obtained from 
the other by reordering independent instructions), and strive to explore one 
interleaving per equivalence class. Optimal algorithms [2,15] achieve this goal. 

DPOR algorithms explore interleavings dynamically. After running the pro- 
gram and obtaining an initial interleaving, they detect racy instructions (i.e., 
instructions accessing the same variable with at least one of them being a write), 
and proceed to explore an interleaving where the race is reversed. 

Let us clarify the exploration procedure with the following example, where 
both variables x and y are initialized to zero. 


if (x = 0) 
y:=1 


r:i=l1 


es (RW+WW) 


The RW+WW program has 5 interleavings that can be partitioned into 3 equiv- 
alence classes. Intuitively, the y := 1 is irrelevant because the program contains 
no other access to y; all that matters is the ordering among the x accesses. 
The exploration steps for RW+WW can be seen in Fig. 1t. DPOR obtains a 
full trace of the program, while also recording the transitions that it took at each 


1 The exploration procedure has been simplified for presentational purposes. For a full 
treatment, please refer to [2,15]. 
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init @ init init © init @) 


1 {rz } {rz, wi} Y Ciste 
(rx) if (...) (rz) if (...) (wi) v:=1 
4 {wy} 4 {wy} ig 
wrens (Wy) y=] were (wy) y= 1 ah 
4 {w1} 
(wi) v:=1 
4 {w2} 
(w2) x := 2 
init @ init © init © 
| {W1 rs} $ {rx wi} } {rx, wi} 
(wi) z:=1 ? (wi) z:=1 (wir = 
a 4 {raw} BS $ {r2, wa} ~ 4 {rx, wa} 
(rx) if (...) (w2) z := 2 (wa) v:= 2 
1 {wa} y {re} 
(w2) x := 2 (rx) if (...) 


Fig. 1. A DPOR exploration of RW+WW 


step at the respective transition’s backtrack set (traces © to @). After obtaining 
a full trace, it initiates a race-detection phase. During this phase, DPOR detects 
the races between ry and the two writes wı and w2. (While w; and wə also write 
the same variable, they do not constitute a race, as they are causally related.) 
For the first race, DPOR adds w in the backtrack set of the first transition, so 
that it can subsequently execute wı instead of ry. For the second one, while we 
is not in the backtrack set of the first transition, wz cannot be directly executed 
as the first transition without its causal predecessors (i.e., w1) having already 
executed. Since w is already in the backtrack set of the first transition, DPOR 
cannot do anything else, and the race-detection phase is over. 

After the race-detection phase is complete, the exploration proceeds in an 
analogous manner: DPOR backtracks to the first transition, fires wı instead of 
r (trace @)), re-runs the program to obtain a full trace (trace @), and initiates 
another race-detection phase. During the latter, a race between ry and wz is 
detected, and wz is inserted in the backtrack set of the second transition. 

Finally, DPOR backtracks to the second transition, executes w instead of 
Tz (trace ©), and eventually obtains the full trace ©. During the last race- 
detection phase of the exploration, DPOR detects the races between ry and the 
two writes wı and wa. As rz is already in the backtrack set of the first two 
transitions, DPOR has nothing else to do, and thus concludes the exploration. 

Observe that DPOR explored one representative trace from each equivalence 
class (traces ©, @, and ©). To avoid generating multiple equivalent interleav- 
ings, optimal DPOR algorithms extend the description above by restricting when 
a race reversal is considered. In particular, the TruSt algorithm [15] imposes a 
maximality condition on the part of the trace that is affected by the reversal. 
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2.2 Assume Statements and DPOR 


CHA 


ee zT := 


while (x=0) {} 1 ( 
g := 2 


y:=1 


assume(xÆ0) 


RW+WW-L) es 


RW+WW-A) 


Fig. 2. A variation of RW+WW with an await loop (left) and an assume (right) 


To see how assume statements arise in concurrent programs, suppose that 
we replace the if-statement of RW+WW with an await loop (Fig. 2). Although 
the change does not really affect the possible outcomes for x, it makes DPOR 
diverge: DPOR examines executions where the loop terminates in 1, 2, 3, ... 
steps. Since, however, the loop has no side-effects, we can actually transform it 
into an assume() statement, effectively modeling a loop bound of one. 

Doing so guarantees DPOR’s termination but not its good performance. The 
reason is ascribed to the very nature of DPOR. Indeed, suppose that DPOR 
executes the first instruction of the left thread and then blocks due to assume 
statement. At this point, DPOR cannot simply stop the exploration due to the 
assume statement not being satisfied; it has to explore the rest of the program, 
so that the race reversals make the assume succeed. All in all, DPOR explores 
2 complete and 1 blocked traces for RW+WW-A. 

In general, DPOR cannot know whether some future reversal will ever make 
an assume succeed. Worse yet, it might be the case that there is an exponential 
number of traces to be explored (due to the other program threads), until DPOR 
is certain that the assume statement cannot be unblocked. 

To see this, consider the following program where RW+WW-A runs in par- 
allel with some threads accessing z: 


RW+WW-A | 23 l | a, i= 2 | ag | an = Z (RW-+WW-A-PAR) 


For the trace of RW+WW-A where the assume fails, DPOR fruitlessly explores 
2N traces in the hope that an access to x is found that will unblock the assume 
statement. 

Given that executing an assume statement that fails leads to blocked exe- 
cutions, one might be tempted to consider a solution where assume statements 
are only scheduled if they succeed. Even though such a solution would elimi- 
nate blocking for RW+WW-A, it is not a panacea. To see why, consider a vari- 
ation of RW+WW-A where the first thread executes assume(x% = 0) instead of 
assume(z Æ 0). In such a case, the assume can be scheduled first (as it succeeds), 
but reversing the races among the x accesses will lead to blocked executions. It 
becomes evident that a more sophisticated solution is required. 


3 Key Ideas 


AWAMOCHE, our optimal DPOR algorithm, extends TruSt [15] with three novel 
key ideas: stale-read annotations (Sect. 3.1), in-place revisits (Sect. 3.2) and spec- 
ulative revisits (Sect.3.3). As we will shortly see, these ideas guarantee that 
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AWAMOCHE is strongly optimal: it never initiates fruitless explorations, and all 
explorations lead to executions that are either complete or denote termination 
violations. In the rest of the paper, we call such executions useful. 


3.1 Avoiding Blocking Due to Stale Reads 


Race reversals are at the heart of any DPOR algorithm. TruSt distinguishes two 
categories of race reversals: (1) write-read and write-write reversals, (2) read- 
write reversals. While the former category can be performed by modifying the 
trace directly in place (called a “forward revisit” ), the latter may require remoy- 
ing events from the trace (called a “backward revisit”). To ensure optimality for 
backward revisits, TruSt checks a certain maximality condition for the events 
affected by them, namely the read, which will be reading from a different write, 
and all events to be deleted. 

An immediate consequence is that any read events not satisfying TruSt’s 
maximality condition, which we call stale reads, will never be affected by a 
subsequent revisit. As an example, consider the following program with a read 
that blocks if it reads 0: 


z:= 1 || assume(x = 1) (W+R) 


After obtaining the trace x := 1; assume(x = 1), TruSt forward-revisits the read 
in-place, and makes it read 0. At this point, we know that (1) the assume will fail, 
and (2) that both the read and the events added before it cannot be backward- 
revisited, due to the read reading non-maximally (which violates TruSt’s maxi- 
mality condition). As such, no useful execution is ever going to be reached, and 
there is no point in continuing the exploration. 

Leveraging the above insight, we make AWAMOCHE immediately drop traces 
where some assume is not satisfied due to a stale read. To do this, AWAMOCHE 
automatically annotates reads followed by assume statements with the condition 
required to satisfy the assume, and discards all forward revisits that do not satisfy 
the annotation. 

Even though stale-read annotations are greatly beneficial in reducing block- 
ing, they are merely a remedy, not a cure. As already mentioned, they are only 
leveraged in write-read reversals, and are thus sensitive to DPOR’s exploration 
order. To completely eliminate blocking, AWAMOCHE performs in-place and spec- 
ulative revisits, described in the next sections. 


3.2 Handling Await Loops with In-Place Revisits 


AWAMOCHE’s solution to eliminate blocking is to not blindly reverse all races 
whenever a trace is blocked, but rather to only try and reverse those that might 
unblock the exploration. 

As an example, consider Rw+ww-A-PAR (Fig. 3). After AWAMOCHE obtains 
the first full trace, it detects the races among the z accesses, as well as the (rz, w1) 
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{re wi} {r= w2} J{r= w2} 
(rz) asm(x) (wi) ge:=1 (wi) z := 1 
4{w1} {wi} y{wi} 
(wi) 2 := 1 (rz) asm(zx) (rz) asm(zx) 
{we} y{we} y{wo} 
(wo) x := 2 jp (Wa) B= 2 i (w2) z := 2 
¥{21,@1,..,an} ¥{21,@1,..,an} ¥{21,@1,...,an} 
(za) z:=1 (a) z= 1 (an) an := z 
(a1) a1 = 2 (a1) a1 = (a) z:=1 
(an) Qn := Z (an) an i= Z (an-1) QN-1 (= Z 


Fig. 3. Key steps in AWAMOCHE’s exploration of RW+WW-A-PAR 


init @ init @) init @) 


$ {rz wi} $ {rz w1} 1 {re, wi} 
(rz) asm(zx) (wi) @:=1 (wi) x:=1 
4 {wi} 4 {rz, w2} 4 {rz, we} 
(wi)g:=1 _ (Tr) asm(z) (we) w= 2 r 
4 {wa} a 4 {wa} ws 4 {wa} ee... 
(we) t= 2 (w2) x := 2 (rz) asm(x) 
EN EPE EEA E EN E at 
(21) i= 1 (a1) z:= 1 (a1) z:=1 
: {Zn} {zn} : {Zn} 
(zn) Z:= 7 az i= Nn (zn) Z:= n 


Fig. 4. An AWAMOCHE exploration of RW+WW 


race. (Recall that AWAMOCHE is based on TruSt and therefore does not consider 
the (rz,w2) race in this trace.) At this point, a standard DPOR would start 
reversing the races among the z accesses. Doing so, however, is wasteful, since 
reversing races after the blockage will lead to the exploration of more blocked 
executions. 

Instead, AWAMOCHE chooses to reverse the (rz, w1) race (as this might make 
the assume succeed), and completely drops the races among the z accesses. 


We call this procedure in-place revisiting (denoted by %> in Fig. 3). Intuitively, 
ignoring the z races is safe to do as they will have the chance to manifest in the 
trace where the (ry, w1) race has been reversed. 

Indeed, reversing the (rs, w1) does make the assume succeed, at which point 
the exploration proceeds in the standard DPOR way. AWAMOCHE explores 2% 
traces where the read of x reads 1, and another 2" where it reads 2. Note that, 
even though in this example AWAMOCHE explores 2/3 of the traces that standard 
DPOR explores, as we show in Sect.6 the difference can be exponential. 
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(r1) a1 := z (rı) a1 := x (r2) a2 := £ (r2) a2 := £ 
(c1) CAS(x, a1, b1) ~ +--+ ~> (c1) CAS(a, a1, b1) kh ~- ~a (c2) CAS(2, a2, b2) 
| tra | {rut 
(r2) a2 := £ (ri) ay = 2 
| 
(c2) CAS (a, a2, b2) (c1) CAS(a, a1, b1) 


Fig. 5. An AWAMOCHE exploration of the confirmation-CAS example. 


Suppose now that we change the assume(x) in Rw+ww-A-PAR to assume(x 
= 42) so that there is no trace where the assume is satisfied. The key steps of 
AWAMOCHE’s exploration can be seen in Fig. 4. Upon obtaining a full trace, all 
races to z are ignored and AWAMOCHE revisits rz in place. Subsequently, as the 
assume is still not satisfied, AWAMOCHE again revisits rs in place (trace @)). At 
this point, since there are no other races on x it can reverse, AWAMOCHE reverses 
all the races on z, and finishes the exploration. 

In total, AWAMOCHE explores 2" blocked executions for the updated exam- 
ple, which are all useful. As ry is reading from the latest write to x in all these exe- 
cutions and the assume statement (corresponding to an await loop) still blocks, 
each of these executions constitutes a distinct liveness violation. 


3.3 Handling Confirmation CASes with Speculative Revisits 


In-place revisiting alone suffices to eliminate useless blocking in programs whose 
assume statements arise only due to await loops. It does not, however, eliminate 
blocking in confirmation-CAS loops. Confirmation-CAS loops consist of a spec- 
ulative read of some shared variable, followed by a (possibly empty) sequence of 
local accesses and other reads, and a confirmation CAS that only succeeds if it 
reads from the same write as the speculative read. 

As an example, consider the confirmation-CAS example from Sect.1 and 
a trace where both reads read the initial value, the CAS of the first thread 
succeeds, and the CAS of the second thread reads the result of the CAS of the 
first. Although this trace is blocked and explored by DPOR (since the CAS read 
of the second thread is reading from the latest, same-location write), it does not 
constitute an actual liveness violation. In fact, even though the CAS read that 
blocks does read from the latest, same-location write, the r := x read in the 
same loop iteration does not. In order for a blocked trace (involving a loop) to 
be an actual liveness violation, all reads corresponding to a given iteration need 
to be reading the latest value, and not just one. 

To avoid exploring blocked traces altogether for cases likes this, we equip 
AWAMOCHE with some builtin knowledge about confirmation-CAS loops and 
treat them specially when reversing races. To see how this is done, we present a 
run of AWAMOCHE on the confirmation-CAS example of Sect. 1 (see Fig. 5). 
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While building the first full trace (trace @), another big difference between 
AWAMOCHE and standard DPOR algorithms is visible: AWAMOCHE does not 
maintain backtrack sets for confirmation CASes. Indeed, there is no point in 
reversing a race involving a confirmation CAS, as such a reversal will make the 
CAS read from a different write than the speculative read, and hence lead to an 
assume failure. 

After obtaining the first full trace (trace @)), AWAMOCHE initiates a race- 
detection phase. At this point, the final big difference between AWAMOCHE 
and previous DPORs is revealed. AWAMOCHE will not reverse races between 
reads and CASes, but rather between speculative reads. (While speculative reads 
are not technically conflicting events, they conflict with the later confirmation- 
CASes.) As can be seen in trace @), AWAMOCHE schedules the speculative read 
of the second thread before that of the first thread so that it explores the sce- 
nario where the confirmation of the second thread succeeds before the one of the 
first. 

Finally, simply by adding the remaining events of the second thread before 
the ones of the first thread, AWAMOCHE explores the second and final trace of 
the example (trace @), while avoiding having blocked traces altogether. 


4 Await-Aware Model Checking Algorithm 


AWAMOCHE is based on TruSt [15], a state-of-the-art stateless model checking 
algorithm that explores execution graphs [9], and thus seamlessly supports weak 
memory models. In what follows, we formally define execution graphs (Sect. 4.1), 
and then present AWAMOCHE (Sect. 4.2). 


4.1 Execution Graphs 


An execution graph G consists of a set of events (nodes), representing instruc- 
tions of the program, and a few relations of these events (edges), representing 
interactions among the instructions. 


Definition 1. An event, e € Event, is either the initialization event init, or 
a thread event (t,i, lab) where t € Tid is a thread identifier, i € Idx = N is a 
serial number inside each thread, and lab € Lab is a label that takes one of the 
following forms: 


- Block label: B representing the blockage of a thread (e.g., due to the condition 
of an “assume” statement failing). 

— Error label: error representing the violation of some program assertion. 

— Write label: W™(l, v) where kuy C Wattr £ {excl} denotes special attributes the 
write may have (i.e., exclusive), | € Loc is the location accessed, and v € Val 
the value written. 

- Read label: R'*(1) where k. C Rattr = {awt,spec,excl} denotes special 
attributes the read may have (i.e., await, speculative, exclusive), and l € Loc is 
the location accessed. We note that if a read has the awt or the spec attribute, 
then it cannot have any other attribute. 
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We omit the Ọ for read/write labels with no attributes. The functions tid, idx, 
loc, and val, respectively return the thread identifier, serial number, location, 
and value of an event, when applicable. We use R, W, B, and error to denote 
the set of all read, write, block, and error events, respectively, and assume that 
init € W. We use superscript and subscripts to further restrict those sets (e.g., 
W = {init} U {w €W | loc(w) = 1}). 

In the definition above, read and write events come with various attributes. 
Specifically, we encode successful CAS operations and other similar atomic oper- 
ations, such as fetch-and-add, as two events: an exclusive read followed by an 
exclusive write (both denoted by the excl attribute). Moreover, we have a spec 
attribute for speculative reads, and write Rf for the corresponding confirma- 
tion reads (i.e., the first exclusive, same-location read that is po-after a given 
r € Rc). Finally, we have the awt attribute for reads the outcome of which 
is tied with an assume statement, and write RPIK for the subset of R™t that are 
reading a value that makes the assume fail (see below). 


Definition 2. An execution graph G consists of: 


1. a set G.E of events that includes init and does not contain multiple events 
with the same thread identifier and serial number. 

2. a total order <a on G.E, representing the order in which events were incre- 
mentally added to the graph, 

3. a function G.rf : G.R — G.W, called the reads-from function, that maps each 
read event to a same-location write from where it gets its value, and 

4. a strict partial order G.co C UieLoe G.W, x G.W,, called the coherence order, 
which is total on G.W, for every location | € Loc. 


We write G.R for the set G.ENR and similarly for other sets. Given two events 
€1,€2 E€ G.E, we write e1 <G ez if e1 Sa e2 and e1 # e2. We write Glg for the 
restriction of an execution graph G to a set of events E, and G\ E for the graph 
obtained by removing a set of events E. 


Based on the above graph representation, we define G.po, which orders events 
in the same thread according to their i component, and porf, which is the causal 
order among the graph events, as follows: 


G.po “{(init,e) |e € G.E \ {init}} 
U{(e,e’) € G.E x G.E | tid(e) = tid(e’) A idx(e) < idx(e’)} 
G.porf £(G.po U G.rf)t 


The semantics of a program P under a memory model m is the set of execution 
graphs corresponding to the program that satisfy the consistency predicate of 
m. Consistency predicates generally constrain the possible choices of co and rf, 
thereby indirectly constraining the possible final values of memory locations and 
the values that reads can return. 

TruSt (and by extension, AWAMOCHE), assumes some properties on the mem- 
ory model [15]: porf acyclicity, porf-prefix-closedness, co-maximal-extensibility. 
Intuitively, extensibility captures the idea that executing a program should never 
get stuck if a thread has more statements to execute. 
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Algorithm 1. AWAMOCHE’s exploration algorithm 


1: procedure VERIFY(P) 
2: VISIT p(Gg) 


3: procedure VISIT p(G) 
4: if —consistentm(G) V Ib € G.R". amaximal(G,b) then return 
5: switch a<nextp(G) do 
6: G — G +a 
T case a = L 
8: return “Visited full execution graph G” 
9: case a € error 
10: exit( “error” ) 
11: 
12: 
13: 
14: case a € R \R°" 
15: for w € G.Wioc(a) do 


19: VISIT p(SetRF(G, a, w)) 

20: case a E W 

21: if WWRace(G) then exit(“Write-write race” ) 
22: VisiTp( IPR (G, a)) 

23: Revs — G.Rioc(a) \ dom(G.porf; [a]) 

24: MAYBEBACKWARDREVISIT p(G, Revs, a) 

25: case _ 

26: VISIT p(G) 


4.2 Awamoche 


Similarly to TruSt, AWAMOCHE verifies a concurrent program P by enumerat- 
ing all of its consistent execution graphs (see Algorithm 1). In contrast to TruSt, 
however, AWAMOCHE is strongly optimal: it never explores an execution G where 
there exists some blocked read r € GR that is reading from a non-co-maximal 
write. In other words, AWAMOCHE only visits graphs that lead to useful execu- 
tions’. In order to be able to do so, AWAMOCHE makes stronger assumptions on 
the underlying memory model m, namely that there are no write-write races, 
and that m does not allow porf to contradict co (i.e., that co C porf). 

Next, we first describe how TruSt works, and then proceed with AWAMOCHE’s 
modifications . 

Given a program P, VERIFY visits all consistent execution graphs of P by 
calling VISIT on the execution graph Gg containing only the initialization event. 


? Recall that blocked reads that read from maximal writes are useful, as they denote 
liveness violations. 
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At each step (Line 4), as long as the current graph remains consistent under 
the specified memory model m, VISIT obtains a new event a via nextp(G) 
(Line 5), and extends the current graph G with a (Line 6). We assume that G++a 
adds a to G.E, and also to G.co, in case a is a write. (Recall that co C porf and 
so a’s co-placing is unique.) 

If there are no more events to add to the graph, then G is complete, and 
VISIT returns (Line 7). If a denotes an error, then it is reported to the user and 
verification terminates (Line 9). 

If a is a read, VISIT needs to examine all possible places where a could 
read from. To that end, for each same-location write w in G (Line 15), VISIT 
recursively explores the possibility that a reads from w (Line19). Formally, 
SetRF(G,7r, w) returns a graph G” that is identical to G except for its rf com- 
ponent: 


G' rf = G.rf \ (G.E x {r}) U {(w,r)} 


If a is a write, VISIT examines both the case when a is simply added to G 
(Line 22) and the “backward-revisit” cases for each existing same-location read 
in G that could read from a (Line 5). When a backward-revisits a read r, the 
resulting graph G” only contains the events that were added before r, or are porf- 
before a, and updates r to read from a. Since, however, there might be many 
backward revisits that lead to the exact same graph G’, to ensure optimality, 
G’ is visited only when the current graph G forms a maximal extension of G’. 
We do not provide TruSt’s definition of maximal extensions here, as AWAMOCHE 
modifies it to achieve strong optimality. 

Let us now move to the parts of Algorithm 1 that are AWAMOCHE-specific. 

First, AWAMOCHE discards all graphs where some blocked read is reading 
non-maximally (Line 4). As explained in Sect.3.2, such reads cannot be revis- 
ited and will thus only lead to blocked executions. In addition, to guarantee 
correctness, AWAMOCHE raises an error if it detects unordered writes (Line 21). 

Second, whenever a write event a is added, AWAMOCHE revisits all same- 
location blocked reads in place making them read from a (Line22) and 
excluding them from the normal backward-revisit procedure (Line refvisitsp- 
sipr:visitspsrevs). Formally, we define IPR(G,a) to return a graph G” that is 
identical to G apart from its rf component: 


(a)) U ({a} x Gi Gy) 


Third, whenever a confirmation read a is added (Line 11), i.e., an exclusive read 
that succeeds an unmatched speculative read e, AWAMOCHE only explores the 
execution where a reads from the same write as e (Line 13): any other write 
would make the confirmation CAS fail. 

Fourth, whenever a speculative read a is added to read from a candidate 
write w and there is another speculative read b reading from the same write w 
(Line 16), AWAMOCHE backward-revisits b to read from a. Note that, due to the 
atomicity of the confirming CASes, there can be at most one other speculative 
read b reading from w, and so AWAMOCHE revisits it to read from a, making it 
blocked, so that it get revisited in place when the confirming CAS of a is added 
to the graph. (To ensure graph well-formedness, we assume that IPR(G, b) does 


G! xf = G.rf \ (G.E x G.RoK 


‘loc 
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Algorithm 2. AWAMOCHE’s backward-revisit algorithm 
1: procedure MAYBEBACKWARDREVISIT p(G, Revs, a) 
2: for r € Revs do 
[di,...,dn] — sorte, ({e E G.E| r < eA (e,a) ¢ G.porf}) 
if 3G’, G” such that G’ ~ Gea St Glaz\{a} and r g G” Re then 
VISIT p(IPR(SetRF(G’ + [r,a], r,a), a)) 


not modify G when called with a read argument b, and that SetRF(G, b, _) makes 
b read from L, which IPR also considers.) 

Finally, similarly to TruSt, AWAMOCHE only performs a backward revisit if 
G forms a maximal extension, though AWAMOCHE employs a slightly different 
definition of maximal extensions. AWAMOCHE’s backward-revisit algorithm can 
be seen in Algorithm 2. 

Roughly, AWAMOCHE performs a backward revisit from a to r that leads to 
a graph IPR(G,.,a) if, starting from G, without r and a, and adding r and all 
the deleted events in a co-maximal way (and performing in-place revisits along 
the way), leads to G. Formally, we write Gi > G2 if there exists G’, such that 
Gz = IPR(G}, e), G, = Gi + e and: 


G1 -rf = Gy.rf U {(maxe.co,,€)} G1.co = Gi.co ifeeER 
Gi .rf = G.rf G} .co = Gy.coU {(w, e)|w € G.W} ife ew 
Gi rf = Gy.rf G1 .co = Gy.co otherwise 


We note that, for the special case where e € RSP® and there is e’ € GR ec(e) 


such that e’ is not followed by the matching confirmation CAS, we consider | as the 
MaxX@.co,- As a final remark, note that, AWAMOCHE modifies next p(G) so that (a) 
after scheduling a speculative read, it keeps scheduling events in the same threads 
until the respective confirming CAS is added, and (b) it does not schedule events 
from a thread whose last (speculative) read reads L. These modifications ensure 
that the confirmation patterns are added one at a time, and that in-place revisits 
take place among confirming CASes and speculative reads. 


5 Correctness and Optimality 


Proving AWAMOCHE correct is non-trivial, as we had to develop a novel proof 
strategy. In what follows, we first review TruSt’s proof argument, show why it 
is inapplicable for AWAMOCHE. Then, we explain our proof strategy (Sect. 5.1) 
and state our completeness and optimality results (Sect. 5.2). 


5.1 Approaches to Correctness 


TruSt. The proof of TruSt proceeds in a backward manner. Specifically, TruSt’s 
proof is based on a procedure PREV that, given an execution G, recovers the 
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Fig. 6. TruSt: In-place revisits make it impossible to determine the last step taken 


unique “previous” execution G, that the algorithm must reach in order to visit 
G. To do so, assuming a left-to-right addition order of events, PREV(G) finds 
the rightmost porf-maximal event e of G, and decides whether e was added in 
a non-revisit step, or e is a read that was just revisited by a write event located 
to its right. If e was added in a non-revisit step, then Gp is simply G without e. 
Otherwise, PREV obtains Gp from G in the following way: it removes e along with 
the write w that e reads from, and then iteratively adds the leftmost available 
event to G in a co-maximal way, until w is about to be added. 

TruSt’s completeness and optimality are proved using PREV. For the former, 
one can show that each consistent final execution can reach the initial empty 
execution through a series of PREV steps, and each of these steps is matched by 
a forward step of TruSt. For the latter, one can show that each step of TruSt is 
matched by the (unique) PREv step. 

To see why we cannot follow a similar approach for AWAMOCHE, consider 
the program of Fig.6, along with one of its executions. We will show that in- 
place revisits make it impossible to trace the algorithm’s last step merely by 
inspecting the execution. Assuming a left-to-right addition order, AWAMOCHE 
will reach this execution as follows: it first adds R(x), R(y) and W(x, 1) (notice that 
at this point the first read is blocked), then in-place revisit R(x), and finally add 
W(y, 1) and backward-revisit R(y). This last revisit, however, creates a problem: 
TruSt’s proof assumes that a backward revisit (r,w) implies that w is located 
at the right of r, which is clearly not the case here. The fact that in AWAMOCHE 
backward revisits can happen in both directions, makes it impossible to trace 
the algorithm’s last step simply by inspecting an execution. 


Awamoche. In contrast to TruSt, AWAMOCHE’s proof proceeds in a forward 
fashion. For each consistent final execution Gp we show 1. which steps are taken 
by the algorithm in order to reach Gy, and 2. that these are the only possible 
ones that lead to Gs. To do so, we first define a notion of a prefix: we say that 
an execution G is a prefix of G’ (written G E G”), if G” can be reached from G 
with a series of operational steps. In turn, we define an operational step to be a 
step that the algorithm may take in the non-revisit case (without demanding it 
is the one actually taken by the algorithm), that may perform in-place revisits 
as well. 

Using this notion of prefixes, our proof defines a procedure SuCCS that, given 
a consistent execution Gp and an execution G produced by the algorithm such 
that G C Gy, SUCCS returns the minimal sequence of algorithm steps that reach 
some execution G” for which it is G C G” E Gy. Concretely, if nextp(G) can 
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be added to G such that the resulting execution G” is a prefix of Gy, SUCCS 
returns this addition step. Otherwise, nextp(G) is a read event r that must be 
first revisited by an event e in order to reach an execution that is a prefix of Gg. 
Succs then returns the sequence of algorithm steps that reach the execution 
resulting from extending G with the porf-prefix of e and setting r to read from 
e (or from L, if e is a speculative read). Both completeness and optimality follow 
from SUCCS’s properties, as well as from the observation that every consistent 
final execution can be reached by a series of operational steps. 


5.2 Awamoche: Completeness, Optimality, and Strong Optimality 


Before stating our results, we first formally define useful executions. Recall that 
these are executions where all blocking reads corresponding to await loops are 
reading maximally (such executions denote liveness violations), and no confir- 
mation CAS fails. 


Definition 3. A consistent execution G is useful if every read in G.RPK reads 
from a G.co-mazimal write and no confirmation CAS fails. 


Next, we define the class of input programs that satisfy our assumptions. 


Definition 4. A program P is well-formed if every speculative read is followed 
by a confirmation CAS with no write in-between, and all writes to locations 
accessed by speculative reads write distinct values. 


Completeness and Optimality. Completeness guarantees that every useful 
final execution is explored. AWAMOCHE is complete for well-formed programs 
that do not exhibit write-write races. 


Theorem 1 (Completeness). Given a well-formed program P, VERIFY(P) 
either detects a write-write race and exits, or visits every useful final execution 


of P. 


Optimality states that (1) no equivalent final executions are explored, (2) 
there are no fruitless explorations that never lead to a consistent final execution. 


Definition 5. We call an execution G visited by AWAMOCHE fruitless if it does 
not recursively lead to any VISIT(P,Gș) call, for any consistent final execution 
Gy. 


AWAMOCHE is optimal for well-formed programs. 


Theorem 2 (Optimality). Given a well-formed program P (1) VERIFY(P) 
never visits two equivalent final executions, and (2) if VistT(P,G) directly leads 
to a call to VisIt(P, G’) with G being fruitless, then VISIT(P, G’) will not initiate 
any other VISIT calls. 
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Observe that in the optimality theorem above, fruitless exploration can lead 
to an extra VISIT step. The reason for that is the treatment of CASes: the read 
part of a CAS c can be added so that it reads from the same write as a different 
(successful) CAS. In such a case, there is no way to consistently add the pending 
write of c without revisiting, which in turn may not be able to happen due to 
AWAMOCHE’s maximality condition. 


Strong Optimality. Strong optimality states that, apart from being opti- 
mal, only useful executions are visited. AWAMOCHE is strongly-optimal for well- 
formed programs. 


Theorem 3 (Strong Optimality). Given a well-formed program P, 
VERIFY(P,G) only visits useful executions. 


6 Evaluation 


We implemented AWAMOCHE as a tool that verifies C/C++ programs under 
the RC11 memory model [22]. Similarly to other stateless model checkers, 
AWAMOCHE works at the level of the LLVM Intermediate Representation 
(LLVM-IR). 

In what follows, we evaluate the effectiveness of AWAMOCHE’s key ideas 
(namely, stale-read annotations, in-place revisiting and speculative revisiting) 
both individually, and as a whole. To that end, we evaluate AWAMOCHE on a set 
of benchmarks that both amplify the weaknesses of standard DPOR, as well as 
demonstrate the applicability of our approach in realistic workloads. In all our 
tests, we compare AWAMOCHE against a vanilla version of TruSt, a version of 
TruSt that employs stale-read annotations (TruStg;,;), and a version of TruSt 
that employs both stale-read annotation and in-place revisiting (TruStzpr). 

Even though there are other stateless model checking tools that can be used 
to verify C/C++ programs (namely, GENMC [19] and NipHuGG [1]), we do 
not compare against them here, as we care about AWAMOCHE’s performance 
compared to TruSt. We only mention in passing that we expect GENMC’s per- 
formance to be similar to that of TruStsraıs (as its implementation incorpo- 
rates various optimizations for assume statements), and NIDHUGG’s similar to 
TruStipr (as it employs an optimization with a similar effect to in-place revisit- 
ing [14]). We also note that comparing with NIDHUGG is difficult since it operates 
under a different memory model, and does not transform the same types of loops 
to assume statements as AWAMOCHE (also see Sect. 7). 

We draw two major conclusions from our evaluation. First, AWAMOCHE’s 
optimization yields exponential performance benefits compared to standard 
DPOR approaches. Second, these benefits do not only apply to small synthetic 
benchmarks, but also extend to realistic concurrent data structures. 


Experimental Setup. We conducted all experiments on a Dell PowerEdge M620 
blade system, running a custom Debian-based distribution, with two Intel Xeon 
E5-2667 v2 CPU (8 cores @ 3.3GHz), and 256GB of RAM. We used LLVM 
11.0.1 for AWAMOCHE. Unless explicitly noted otherwise, all reported times are 
in seconds. We set a timeout limit of 30 min. 
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Table 1. Synthetic benchmarks 


Executions TruSt TruStsrare TruStipr AWAMOCHE 
Blocked | Time | Blocked | Time | Blocked | Time | Blocked | Time 

orch-run (4) 1 15 |.01l 0 | .O1 0 | .01 0 OL 
orch-run (5) 1 31 | .01 0 | .O1 0 | .01 0 OL 
orch-run (6) 1 63 | .01 0 | .O1 0 | .01 0 .01 
wait-workers(4) | 24 96 |.03 96 |.02 0 |.01 0 .01 
wait-workers (5) | 120 600 .09 600 |.09 0 | .03 0 03 
wait-workers (6) | 720 4320 |.56 4320 |.56 0 | 14 0 14 
nr+nw(3,2) 0 27 | .01 10 |.03 1 |.01l 1 .01 
nrtnw(5,4) 0 3125 | .1 126 |.03 1 |.0l 1 .01 
nrtnw(6,5) 0 46656 | 1.32 462 | .06 1.01 1. .01 
conf-loop(4) 24 256 |.04 176 |.03 124 |.03 0 0.01 
conf-loop(5) 120 3905 |.09 2010 | .10 1185 |.06 0 0.02 
conf-loop(6) 720 75156 | 1.40 | 26916 | .96 13086 |.54 0 0.08 


orch-run: N threads are spawned and wait to be signaled before they start performing 
thread-local computations. 

wait-workers: A worker thread waits for N workers to publish their results before it 
starts running. 

nrtnw: A synthetic benchmark where K reader threads wait until a variable written L 
times by a writer thread satisfies some condition (which cannot be satisfied). 
conf-loop: N threads perform a confirmation-CAS loop similar to the one of Sect. 1. 


6.1 Results 


Let us first focus on some benchmarks that help us better understand where each 
of AWAMOCHE’s components can be applied (Table 1). Starting with orch-run, 
we see that even though blocked executions greatly outnumber complete exe- 
cutions, stale-reads annotations alone suffice to bring the number of blocked 
executions down to zero. This, however, is partly due to luck: in orch-run, 
main() spawns a number of workers that do not execute until they are signaled 
by main() using a special variable. In turn, because TruStsrare follows a left-to- 
right scheduling, when DPOR encounters the worker threads, the scenario where 
they are not signaled is not considered, since it implies reading a stale value. 
By contrast, in wait-workers and nrtnw, stale-reads annotations are insuf- 
ficient to eliminate blocking. In these benchmarks, some designated threads wait 
for the rest of the workers to perform some tasks before proceeding. However, it 
is not guaranteed that these designated threads are going to be always processed 
after the rest of the threads by DPOR, and thus stale-reads annotations have 
little to no effect. Employing in-place revisiting, on the other hand, leads to a 
dramatic performance improvement: the number of blocked executions is effec- 
tively eliminated (the single blocked execution in nr+nw is a liveness violation). 
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Table 2. Real-world benchmarks 


Executions | 'TruSt TruStsrare TruStipr AWAMOCHE 
Blocked | Time | Blocked | Time | Blocked | Time | Blocked | Time 
mpmc-engq (4) 576 1084 | .25 710 | .22 532 | .17 0 2 
mpmc-engq(5) 7200 31325 | 4.12 16382 | 3.27 12205 | 2.72 |O 1.48 
mpmc-enq (6) 86400 730626 | 82.28 | 303362 | 51.29 | 227766 42.14 | 0 19.71 
treiber-push (4) 24 256 | .07 176 | .04 124 | .04 0 04 
treiber-push(5) 120 3905 | .41 2010 | .29 1185 | .19 0 05 
treiber-push (6) 720 75156 | 7.49 26916 | 3.61 13086 1.85 |0 23 
m-enq(4) 24 124 | 0.05 124 | 0.04 124 0.04 |0 0.02 
m-enq(5) 120 1185 | 0.11 1185 | 0.14 1185 |0.13 |0 0.04 
m-enq(6) 720 13086 | 1.04 13086 | 1.05 13086 1.18 |0 0.24 


mpmc-enq: N threads enqueue an item in a multiple-producer multiple-consumer queue. 
treiber-push: A lock-free stack implementation. N threads are pushing an item. 
m-enq: A modification of the Michael-Scott queue without the tail pointer. N threads 
are enqueueing an item. 


Analogously to wait-workers and nr+nw, conf-loop demonstrates why in- 
place revisiting is insufficient when the success of an assume does not depend 
on a single load, but rather on a sequence of actions (as is the case in confirma- 
tion loops). As it can be seen, TruStypr still explores blocked executions, which 
AWAMOCHE manages to eliminate thanks to speculative revisits. 

Moving to the final part of our evaluation, Table2 demonstrates that the 
benefits of AWAMOCHE extend to realistic workloads as well. As can be seen from 
Table 1, none of AWAMOCHE’s optimizations is redundant, as they are often all 
required to eliminate the exploration of blocked executions. Observe, however, 
that our benchmarks only exercise push or enqueue operations. This is because 
the respective pop or dequeue operations contain assume statements in their 
confirmation-CAS loops, and therefore cannot be optimized by AWAMOCHE. 


7 Related Work 


The seminal work of Flanagan and Godefroid [13] has spawned a number of 
papers on DPOR. Among these, OPTIMAL-DPOR [2] and TruSt [15] stand out, 
as they provide the first optimal DPOR algorithm, and the first optimal DPOR 
algorithm with polynomial memory consumption, respectively. TruSt is based 
on [17] and thus has the extra advantage of being parametric in the choice of 
the underlying weak memory model. 

A lot of works improve on DPOR one way or another. Many techniques 
introduce coarser equivalence partitionings to combat the state-space explosion 
problem (e.g., [3,6-8,10-12]). Other works focus on extending it to weak memory 
models [1,4,5,17,20,24], while others try to leverage particular programming 
patterns [14, 16,18]. Kokologiannakis, Ren, and Vafeiadis [18] in particular, deal 
with transforming spinloops into assume statements, the handling of which we 
optimize in this paper. 
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Among those, the work that is closest to ours is GODOT [14]. GODOT is an 
extension to DPOR that has a similar effect to in-place revisiting in the sense 
that it only explores executions that are either complete, or denote program 
termination errors. That said, GODOT only works under SC, and cannot handle 
stale-read annotations or confirmation loops (which are instrumental in scaling 
the verification of concurrent data structures, as we saw in Sect. 6). In addition, 
Gobot’s loop transformation is static (in contrast to AWAMOCHE’s, which is 
dynamic), making it easy to construct examples where GODOT’s transformation 
does not work. Finally, even though GODOT does not impose a “no write-write 
race” restriction on the input programs, this restriction is trivially satisfied for 
models like SC or TSO [26]: in such models, it is sound to transform writes to 
atomic exchange statements that write the value they read, thereby ordering all 
writes to each location. 


8 Conclusion 


We presented AWAMOCHE, the first memory-model-agnostic DPOR algorithm 
that is sound, complete, and strongly optimal for programs with await and 
confirmation-CAS loops. AWAMOCHE avoids blocked executions that arise due 
to await loops by revisiting blocking reads in-place, and deals with confirmation- 
CAS loops by also considering revisits whenever two speculative reads read from 
the same write. 

As our theoretical and experimental results demonstrate, AWAMOCHE yields 
exponential benefits over the current state-of-the-art. Yet, it does not support 
certain more advanced patterns commonly appearing in concurrent programs, 
the handling of which we leave as future work. Examples of such patterns include 
confirmation-CAS loops with assume statements between the speculative and the 
confirmation reads (such statements may arise due to break/continue instruc- 
tions), elimination backoff data structures, and await loops that use CASes 
instead of plain reads. We also believe that our key ideas for achieving strong 
optimality in these cases should be applicable in other scenarios as well, such as 
in programs with mutual exclusion locks or transactions. 
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Abstract. We present a major new version of Scenic, a probabilistic 
programming language for writing formal models of the environments of 
cyber-physical systems. Scenic has been successfully used for the design 
and analysis of CPS in a variety of domains, but earlier versions are lim- 
ited to environments that are essentially two-dimensional. In this paper, 
we extend Scenic with native support for 3D geometry, introducing new 
syntax that provides expressive ways to describe 3D configurations while 
preserving the simplicity and readability of the language. We replace 
Scenic’s simplistic representation of objects as boxes with precise mod- 
eling of complex shapes, including a ray tracing-based visibility system 
that accounts for object occlusion. We also extend the language to sup- 
port arbitrary temporal requirements expressed in LTL, and build an 
extensible Scenic parser generated from a formal grammar of the lan- 
guage. Finally, we illustrate the new application domains these features 
enable with case studies that would have been impossible to accurately 
model in Scenic 2. 


Keywords: Scenario description language - Synthetic data - 
Probabilistic programming - Automatic test generation - Simulation 


1 Introduction 


A major challenge in the design of cyber-physical systems (CPS) like autonomous 
vehicles is the heterogeneity and complexity of their environments. Increasingly, 
problems of perception, planning, and control in such environments have been 
tackled using machine learning (ML) algorithms whose behavior is not well- 
understood. This trend calls for verification techniques for ML-based CPS; how- 
ever, a significant barrier has been the difficulty of constructing formal models 
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that capture the diversity of these systems’ environments [25]. Indeed, building 
such models is a prerequisite not only for verification but any formal analysis. 

Scenic [10,12] is a probabilistic programming language that addresses this 
challenge by providing a precise yet readable formalism for modeling the environ- 
ments of CPS. A Scenic program defines a scenario describing physical objects 
in a world, placing a probability distribution on their positions and other prop- 
erties; a single program can generate many different concrete scenes by sampling 
from this distribution. Scenic also allows defining a stochastic policy describing 
how agents behave over time, and implementing the resulting dynamic scenarios 
in a variety of external simulators. Environment models defined in Scenic can be 
used for many tasks: falsification, as in the VerifAI toolkit [5], but also debugging, 
training data generation, and real-world experiment design [13]. These tasks have 
been successfully demonstrated in a variety of domains including autonomous 
driving [29], aviation [9], and reinforcement learning agents [1]. 

Despite Scenic’s successes, it has several limitations that prevent its use in a 
number of applications of interest. First, the original language models the world 
as being two-dimensional, since this enables a substantial simplification in the 
language’s syntax (e.g., orientations being a single angle) as well as optimiza- 
tions in its implementation. The 2D assumption is reasonable for domains such 
as driving but leaves Scenic unable to properly model environments for aerial 
and underwater vehicles, for example. There can be problems even for ground 
vehicles: Scenic could not generate a scene where a robot vacuum is underneath 
a table, as their 2D bounding boxes would overlap and Scenic would treat them 
as colliding. The use of bounding boxes rather than precise shapes also leads 
Scenic to use a simplistic visibility model that ignores occlusion, making it pos- 
sible for Scenic to claim objects are visible when they are not and vice versa: a 
serious problem when generating training data for a perception system. 

Fundamentally, verification of Al-based autonomous systems requires rea- 
soning about perception and physics in a 3D world. To support such reasoning, 
a formal environment modeling language must provide faithful representations 
of 3D geometry. Towards this end, we present Scenic 3.0!, a largely backwards- 
compatible major release featuring: 


— Native 3D Syntax: We update Scenic’s existing syntax to support 3D geom- 
etry, and add new syntax making it possible to define complex 3D scenarios 
simply. For example, an object’s orientation can be specified as being tangent 
to a surface and facing another object as much as possible. 

— Precise 3D Shapes: The shapes of objects (as well as surfaces and volumes) 
can be given by arbitrary 3D meshes, with Scenic performing precise reasoning 
about collisions, containment, tangency, etc. 

— Precise Visibility: We use ray tracing for precise visibility checks that take 
occlusion into account. 

— Temporal Requirements: We support arbitrary Linear Temporal Logic [21] 
properties to constrain dynamic scenarios (vs. only Gp and Fp in Scenic 2). 


1 Available at: https://github.com/BerkeleyLearn Verify /Scenic/. 
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— Rewritten Parser: We give a Parsing Expression Grammar [8] for Scenic, 
using it to generate a parser with more precise error messages and better 
support for new syntax and optimization passes. 


We first define the new features in Scenic 3 in detail in Sect. 2, working 
through several toy examples. Then, in Sect. 3, we describe two case studies 
using Scenic with scenarios that could not be accurately modeled without the 
new features: falsifying a specification for a robot vacuum and generating training 
data constrained by an LTL formula for a self-driving car’s perception system. 


Related Work. There are many tools for test and data generation [3]. Some 
approaches learn from examples [7,26] and so do not provide specific control 
over scenarios as Scenic does. Approaches based on rules or grammars [17, 20, 
26] provide some control but have difficulty enforcing requirements over the 
generated data as a whole. Several probabilistic programming languages have 
been used for generation of objects and scenes [15,22,23], but none of them 
provide specialized syntax to lay out geometric scenarios, nor for describing 
dynamic behaviors. Finally, there has been work on synthetic data generation 
of 3D scenes and objects using ML techniques such as GANs (e.g., [7,14,30]), 
but these lack the specificity and controllability provided by a programming 
language like Scenic. 


2 New Features 


2.1 3D Geometry 


The primary new feature in Scenic 3 is the generalization of the language to 3 
dimensions. Some changes, like changing the type system so that vectors have 
length 3, are obvious: here we focus on cases where the existing syntax of Scenic 
does not easily generalize, using simple scenarios to motivate our design choices. 
The first challenge when moving to 3D is the representation of an object’s 
orientation in space: Scenic’s existing heading property, providing a single angle, 
is no longer sufficient. Instead, we introduce yaw, pitch, and roll angles, using 
the common convention for aircraft that these represent intrinsic rotations (i.e., 
yaw is applied first, then pitch is applied to the resulting orientation, etc.). 
Using intrinsic angles makes it easy to compose rotations: for example if we 
point an airplane towards a landing strip with yaw and pitch (either manually 
or using Scenic’s facing toward specifier — more on this below), we can add 
an additional roll by adding to that property. To further simplify composition, 
we add a parentOrientation property which specifies the local coordinate sys- 
tem in which the 3 angles above should be interpreted (by default, the global 
coordinate system). This allows the user to specify an orientation with respect 
to a previously-computed orientation, for instance that of a tilted surface. 
Scenic provides a flexible system of natural language specifiers which can be 
combined to define properties of objects. Consider the following Scenic 3 code: 
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1 objectA = new Object at (1, 2, 3), facing (45 deg, 0, 90 deg) 
2 objectB = new Object left of objectA by 1 

3 objectC = new Object above objectB by 1, 

4 facing (Range(0,30) deg, Range(0,30) deg, 0) 


Here, we use the at specifier to define a specific position for object A; 
the facing specifier defines the object’s orientation using explicit yaw, pitch, 
and roll angles. We then place object B left of A by 1 unit with the left of 
specifier: this specifier now not only sets the position property, but also sets 
the parentOrientation property to the orientation of object A (unless explic- 
itly overridden). Thus object B will be oriented the same way as A. Simi- 
larly, object C is positioned relative to B and so inherits its orientation as its 
parentOrientation. However, this time we use the facing specifier to define 
random yaw and pitch angles, so object C will face up to 30° off of B. 

Another way to specify an object’s orientation is the facing toward speci- 
fier. This is a case where the 2D semantics become ambiguous in 3D. Consider a 
scenario where the user wants an airplane to be “facing toward” a runway: the 
plane’s body should be oriented toward the runway (giving its yaw), but it is not 
clear whether in addition the plane should be pitched downward so that its nose 
points directly toward the runway. To allow for both interpretations, Scenic 3 
has facing toward only specify yaw, while the new facing directly toward 
specifier also specifies pitch. This is illustrated in Fig. 1. 

Another common practice in 3D space is to place one object on another. 
For example, we may want to place a chair on a floor, or a painting on a wall. 
Scenic’s existing on specifier, which sets the position of an object to be a 
uniformly random point in a given region, does not suffice for such cases because 
it would cause the chair to intersect the floor or the painting to penetrate the 
wall (or both). To fix this issue, we allow each object to define a base point, 
which on positions instead of the object’s center. The default base point is the 
bottom center of the object’s bounding box, suitable for cars and chairs for 
example; a Painting class could override this to be the back center. Finally, to 
enable placing objects on each other, objects can provide a topSurface property 
specifying the surface which is considered the “top” for the purposes of the on 
specifier. As before, there is a reasonable default (the upward-pointing faces of 
the object’s mesh) that can be overridden. This syntax is illustrated in Fig. 2. 

A final 3D complication arises when positioning objects on irregular surfaces. 
Consider a pair of cars driving up an uneven mountain road, with one 10m 
behind the other. We can use the ahead of specifier to place one car 10m ahead 
of the other, but then the car will penetrate the road due to its upward slope. 
Alternatively, the on specifier can correctly place the car so it is tangent to the 
road, but then we cannot directly specify the distance between the cars. The 
natural semantics here would be to combine the constraints from both specifiers, 
but this is illegal in Scenic 2 where a given property (such as position) can 
only be specified by a single specifier at a time. We enable this usage in Scenic 
3 by introducing the concept of a modifying specifier that modifies the value 
of a property already defined by another specifier. Specifically, if an object’s 
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1 ego = new Ball at (0,0, 1.25) 
2 new Plane at (2,0,0), facing toward ego 
3 new Plane at (-2,0,0), facing directly toward ego 


Fig. 1. Line-of-sight-based orientations in Scenic. The ego ball (highlighted green) is 
placed above the origin, as seen by the RGB global coordinate axes, with one plane 
facing towards the ego and another facing directly toward the ego. (Color figure online) 


1 floor = Object with width 5, with length 5, with height 0.1 
2 ego = new Chair on floor 


Fig. 2. A Scenic program placing a chair on a floor. The Z-axis of the global coordinate 
axes protrudes from the floor, indicating which direction is up. 


position is already specified, the on specifier will project that position down 
onto the given surface. This is illustrated by the green chair in Fig. 3. 

Note that the green chair is correctly upright on the floor even though it was 
positioned relative to the cube, and so should inherit parentOrientation from 
the cube as discussed above. In this situation, the user has provided no explicit 
orientation for the chair, and both below and on can provide one. To resolve this 
ambiguity, we introduce a specifier priority system, where specifiers have differ- 
ent priorities for the properties they specify (generalizing Scenic’s existing sys- 
tem where a specifier could specify a property optionally). In our example, below 
specifies position with priority 1 and parentOrientation with priority 3, while 
on specifies these with priorities 1 and 2 respectively. So both specifiers determine 
position (with on modifying the value from below as explained above), but on 
takes precedence over below when specifying parentOrientation. This yields 
the expected behavior while still allowing below to determine the orientation 
when used in combination with other specifiers than on. 
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floor = new Object with width 5, with length 5, with height 0.1 
air_cube = new Object at (Range(-5,5), Range(-5,5), 3), 

facing (Range(0,360 deg), Range(0,30 deg), 0) 
new Chair below air_cube, with color (0,0,200) # blue chair 
ego = new Chair below air_cube, on floor # green chair 


oR WN RE 


Fig. 3. A Scenic program placing a green chair on the floor under a rotated cube in 
midair. A blue chair is placed directly under the cube for clarity. (Color figure online) 


2.2 Mesh Shapes and Regions 


Scenic 2’s approximation of objects by their bounding boxes was adequate for 2D 
driving scenarios, for example, but is wholly inadequate in 3D, where objects are 
commonly far from box-shaped. For example, consider placing a chair tucked in 
under a table. Since the bounding boxes of these two objects intersect, Scenic 2 
would always reject this situation as a collision and try to generate a new scene, 
even if the chair and table are entirely separate. In Scenic 3, each object has a 
precise shape given by its shape property, which is set to an instance of the class 
Shape. The most general Shape class is MeshShape, which represents an arbitrary 
3D mesh and can be loaded from standard formats; classes for primitive shapes 
like spheres are provided for convenience. These shapes are used to perform 
precise collision and containment checks between objects and regions. 

Scenic also supports mesh regions, which can either represent surfaces or 
volumes in 3D space. For example, given a mesh representing an ocean we might 
want to sample on the surface for a boat or in the volume for a submarine. 

All meshes in Scenic are handled using Trimesh [4], a Python library for 
triangular meshes, which internally calls out to the tools Blender [27] and Open- 
SCAD [28] for several operations. These operations tend to be expensive, so 
Scenic uses several heuristics to cheaply determine simple cases; these can give 
between a 10x—1000x speedup when sampling scenes. 
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2.3 Precise Visibility Model 


Scenic 2’s visibility system simply checks if the bounding box corners of objects 
are contained in the view cone of the viewing object, which is no longer adequate 
for 3D scenarios with complex shapes. Visibility checks are now done using ray 
tracing, and account for objects being able to occlude visibility. In addition to 
standard pyramidal view cones used for cameras, Scenic correctly handles wrap- 
around view regions such as those of common LiDAR sensors. Visibility checks 
use a configurable density of rays, and are optimized to only send rays in areas 
where they could feasibly hit the object. 


2.4 Temporal Requirements 


A key feature of Scenic is the ability to declaratively impose constraints on 
generated scenes using require statements. However, Scenic 2 only provides 
limited support for temporal requirements constraining how a dynamic scenario 
evolves over time, with the require always and require eventually state- 
ments. Slightly more complex examples, like “cars A and B enter the intersec- 
tion after car C”, require the user to explicitly encode them as monitors, which 
is error-prone and yields verbose hard-to-read imperative code: this property 
requires an 8-line monitor in [12]. 

Scenic 3 extends require to arbitrary properties in Linear Temporal 
Logic [21], allowing natural properties like this to be concisely expressed: 


1 require (carA not in intersection and carB not in intersection 
2 until carC in intersection) 


The semantics of the operators always, eventually, next, and until are 
taken from RV-LTL [2] to properly model the finite length of Scenic simulations. 


2.5 Rewritten Parser 


For interoperability with Python libraries, Scenic is compiled to Python, and 
the original Scenic parser was implemented on top of the Python parser. This 
approach imposed serious restrictions on the language design (e.g., forcing non- 
intuitive operator precedences), made extending the parser difficult, and led to 
misleading error messages which pointed to the wrong part of the program. 

Scenic 3 uses a parser automatically generated from a Parsing Expression 
Grammar (PEG) [8] for the language. The parser is based on Pegen [24], the 
parser generator developed for CPython, and the grammar itself was obtained 
by extending the Python PEG. The new parser outputs an abstract syntax tree 
representing the structure of the original Scenic code (unlike the old parser), 
ensuring that syntax errors are correctly localized and simplifying the task of 
writing analysis and optimization passes for Scenic. 

This new parser gives us flexibility in designing and implementing the lan- 
guage. For example, we carefully assigned precedence to the four new temporal 
operators so that users can naturally express temporal requirements without 
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unnecessary parentheses. There are additional benefits from having a precise 
machine-readable grammar for Scenic: for instance, as we wrote the grammar, 
we discovered ambiguities that had previously been unnoticed and made minor 
changes to the language to eliminate them. The grammar could also be be used 
to fuzz test the compiler and other tools operating on Scenic programs. 


3 Case Studies 


In this section, we discuss two case studies in the robotics simulator Webots [19]. 
The code for both case studies is available in the Scenic GitHub repository [11]. 
The first case study, performing falsification of a robot vacuum, illustrates a 
domain that could not be modeled in Scenic 2 due to the lack of 3D support. 
The second case study, generating data constrained by an LTL formula for testing 
or training the perception system of an autonomous vehicle, is an example of 
how the new features in Scenic 3 can significantly improve effectiveness even in 
one of Scenic’s original target domains. 


3.1 Falsification of a Robot Vacuum 


In this example we evaluate the iRobot Create [16], a robot vacuum, on its 
ability to effectively clean a room filled with objects. We use a specification 
stating that the robot must clean at least a third of the room within 5 min: in 
Signal Temporal Logic [18], the formula p = Fjo,300 (coverage > 1/3). We use 
Scenic to generate a complete room and export it to Webots for simulation. The 
room is surrounded by four walls and contains two main sections: in the dining 
room section, we place a table of varied width and length randomly on the floor, 
with 3 chairs tucked in around it and another chair fallen over. In the living 
room section, we place a couch with a coffee table in front of it, both leaving 
randomly-sized spaces roughly the diameter of the robot vacuum. We then add 
a variable number of toys, modeled as small boxes, cylinders, cones, and spheres, 
placed randomly around the room; for a taller obstacle, we place a stack of 3 
box toys somewhere in the room. Finally, we place the vacuum randomly on the 
floor, and use Scenic’s mutate statement to add noise to the positions and yaw 
of the furniture. Several scenes sampled from this scenario are shown in Fig. 4. 
We tested the default controller for the vacuum against 0, 1, 2, 4, 8, and 
16-toy variants of our Scenic scenario, running 25 simulations for each variant. 
For each simulation, we computed the robustness value [6] of our spec y. The 
average values are plotted in Fig.5, showing a clear decline as the number of 
toys increases. Many of the runs actually falsified y: up to 44% with 16 toys. 
There are several aspects of this example that would not be possible in Scenic 
2. First, the new syntax in Scenic 3 allows for convenient placement of objects, 
specifically the use of on in combination with left of and right of, to place 
the chairs on the appropriate side of the dining table but on the floor. Many 
of the objects are also above others and have overlapping bounding boxes, but 
because Scenic now models shapes precisely, it is able to properly register these 
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Fig. 4. Several sampled scenes from the robot vacuum scenario. 
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Fig. 5. Spec. robustness value vs. number of toys, averaged over 25 simulations. 


objects as non-intersecting and place them in truly feasible locations (e.g., in 
Fig. 4, the toy under the dining table in the top left scene and the robot under 
the coffee table in the bottom right scene). 


3.2 Constrained Data Generation for an Autonomous Vehicle 


In this example we generate instances of a potentially-unsafe driving scenario 
for use in training or testing the perception system of an AV. Consider a car 
passing in front of the AV in an intersection where the AV must yield, and so 
needs to detect the other car before it becomes too late to brake and avoid a 
collision. We want to generate time series of images labeled with whether or 
not the crossing car is visible, for a variety of different scenes with different city 
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(a) 2 seconds: not visible (b) 2.5 seconds: visible 
(c) 4 seconds: visible (d) 4.5 seconds: not visible 


Fig. 6. Intersection simulation images, with visibility label for the crossing car. 


layouts to provide various openings and backdrops. Our scenario places both the 
ego car (the AV) and the crossing car randomly on the appropriate road ahead 
of the intersection. We place several buildings along the crossing road that block 
visibility, allowing some randomness in their position and yaw values. We also 
place several buildings completely randomly behind the crossing road to provide 
a diverse backdrop of buildings in the images. Finally, we want to constrain data 
generation to instances of this scenario where the crossing car is not visible until 
it is close to the AV, as these will be the most challenging for the perception 
system. Using the new LTL syntax, we simply write: 


1 require (not ego can see car) until distance to car < 75 


Figure6 shows a simulation sampled from this scenario. In Scenic 2, the 
crossing car would be wrongly labeled as visible in image (a), since the occluding 
buildings would not be taken into account. This would introduce significant 
error into the generated training set, which in previous uses of Scenic had to 
be addressed by manually filtering out spurious images; this is avoided with the 
new system. 


4 Conclusion 


In this paper we presented Scenic 3, a major new version of the Scenic pro- 
gramming language that provides full native support for 3D geometry, a precise 
occlusion-aware visibility system, support for more expressive temporal opera- 
tors, and a rewritten extensible parser. These new features extend Scenic’s use 
cases for developing, testing, debugging, and verifying cyber-physical systems to 
a broader range of application domains that could not be accurately modeled in 
Scenic 2. Our case study in Sect. 3.1 demonstrated how Scenic 3 makes it easier 
to perform falsification for CPS with complex 3D environments. Our case study 
in Sect. 3.2 further showed that even in domains that could already be modeled in 
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Scenic 2, like autonomous driving, Scenic 3 allows for significantly more precise 
specifications due to its ability to reason accurately about 3D orientations, colli- 
sions, visibility, etc.; these concepts are often relevant to the properties we seek 
to prove about a system or an environment we want to specify. We expect the 
improvements to Scenic we describe in this paper will impact the formal meth- 
ods community both by extending Scenic’s proven use cases in simulation-based 
verification and analysis to a much wider range of application domains, and by 
providing a 3D environment specification language which is general enough to 
allow a variety of new CPS verification tools to be built on top of it. 

In future work, we plan to develop 3D scenario optimization techniques (com- 
plementing the 2D methods Scenic already uses) and explore additional 3D appli- 
cation domains such as drones. We also plan to leverage the new parser to allow 
users to define their own custom specifiers and pruning techniques. 
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Abstract. In this paper, we consider a model of generalized timed 
automata (GTA) with two kinds of clocks, history and future, that can 
express many timed features succinctly, including timed automata, event- 
clock automata with and without diagonal constraints, and automata 
with timers. 

Our main contribution is a new simulation-based zone algorithm for 
checking reachability in this unified model. While such algorithms are 
known to exist for timed automata, and have recently been shown for 
event-clock automata without diagonal constraints, this is the first result 
that can handle event-clock automata with diagonal constraints and 
automata with timers. We also provide a prototype implementation for 
our model and show experimental results on several benchmarks. To the 
best of our knowledge, this is the first effective implementation not just 
for our unified model, but even just for automata with timers or for 
event-clock automata (with predicting clocks) without going through a 
costly translation via timed automata. Last but not least, beyond being 
interesting in their own right, generalized timed automata can be used for 
model-checking event-clock specifications over timed automata models. 
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Fig. 1. An automaton with clocks on left, and timers on right for same constraints. 


1 Introduction 


The idea of adding real-time dynamics to formal verification models started as 
a hot topic of research in the 1980s [6,11]. Over the years, timed automata [8,9] 
has emerged as a leading model for finite-state concurrent systems with real-time 
constraints. Timed automata make use of clocks, real-valued variables which 
increase along with time. Constraints over clock values can be used as guards 
for transitions, and clocks can be reset to 0 along transitions. It is notable that 
the early works in this area made use of timers to deal with real-time [13, 22,32]. 
Timers are started by setting them to some initial value within a given interval. 
Their values decrease with time, and an timeout event can be used in transitions 
to detect the instant when the timers become 0. Quoting from [6], the shift from 
timers to clocks in timed automata, as we know them today, is attributed to the 
fact that: “apart from some technical conveniences in developing the emptiness 
algorithm and proving its correctness, the reformulation allows a simple syntac- 
tic characterization of determinism for timed automata”. Over the last thirty 
years, the study of timed automata has led to the development of rich theory 
and industry-strength verification tools. The use of clocks has also allowed for 
the extension of the model to more complex constraints and assignments to 
clocks in transitions [14,17]. Furthermore, considering more sophisticated rates 
of evolution for clocks gives the yet another well-established model of hybrid 
automata [7]. 

When it comes to the reachability problem, timers do have some nice proper- 
ties. Let us explain with an example. Figure 1 shows a timed automaton on the 
left, and an automaton with timers on the right, for the set of words ab* such 
that the time between every consecutive letters is 1. The timed automaton sets 
clock x to 0 and checks for the guard x = 1? to enforce the timing constraint. 
The automaton with timers, on the right, sets a timer ty to 1, and asks for its 
expiry in the immediate next action. Clock y and timer ty are not necessary for 
the required timing property, but we add them to illustrate a different aspect 
that we will describe now. To solve the reachability problem, a symbolic enumer- 
ation of the state space is performed. In the timed automaton, at state q1, the 
enumeration gives constraints y— x = n for every n > 0. Starting from y— x =n 
and executing b gives y — x = n + 1, due to the combination of guard x = 1? 
and reset x := 0. This shows that a naive symbolic enumeration is not bound 
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to terminate. The question of developing finite abstractions for timed automata 
has been a central problem of study which started in the late 90s and continues 
till date (see recent surveys [18,38]). Such an issue does not occur with timers. 
In the automaton with timers on the right, tz is set to 1 and ty is set to some 
arbitrary value in the transition to qı. This gives —1 < ty — tz < oo for the set 
of all possible timer values. When t, times out, the value of ty could still be any 
value from 0 to oo. When tz is set to 1 again, the set of possible timer values still 
satisfies the same constraint —1 < ty — ts < oo leading to a fixed point with a 
finite reachable state space. The fact that symbolic enumeration terminates on 
an automaton with timers was already observed in [22]. To our knowledge, later 
works on timed automata reachability never went back to timers, and there is 
no tool support that we know of to deal with models with timers directly. We 
find this surprising given that timers occur naturally while modeling real-time 
systems and moreover they enjoy this finiteness property. 

In addition to clocks and timers, event-clocks are another special type of clock 
variables that are used to deal with timing constraints [10], which are attached 
to events. An event-recording clock for event a maintains the time since the 
previous occurrence of a, whereas an event-predicting clock for a gives the time 
to the next occurrence of a. Event-clocks have been used in the model of event- 
clock automata (ECA), and also in the logic of event-clocks [36]. These works 
argue that event-clocks can express typical real-time requirements. Theoretically, 
ECA can be determinized, and hence complemented. Therefore, model-checking 
an event-clock (logic or automaton) specification y over a timed automaton A 
can be reduced to reachability on the product of A and the ECA for ~g. This 
makes event-clocks a convenient feature in specifications. 

Recently, a symbolic enumeration algorithm for ECA was proposed [3]. It was 
noticed that when restricted to event-predicting clocks, the symbolic enumera- 
tion terminates without any additional checks (similar to the case of timers), 
whereas for the combination involving event-recording clocks, one needs simu- 
lation techniques from the timed automata literature. The same work showed 
how to adapt the best known simulation technique from timed automata into 
the setting of ECA. However, as discussed above, for model-checking we need 
a model containing both conventional clocks, timers and event-clocks. To our 
knowledge, no tool can directly work on such models. 

Our goal in this work is to provide a one stop solution to real-time verification, 
be it reachability analysis or model-checking (over event-clock specifications), be 
it using models with clocks, or models with timers. We consider a unified model 
of a timed automaton over variables that can simulate normal clocks, timers and 
event-clocks. Here are our key contributions: 


1. We define a new model of generalized timed automata (GTA) which have 
two types of variables, called history clocks and future clocks. History clocks 
generalize normal clocks as well as event-recording clocks, while future clocks 
generalize event-predicting clocks and timers. However, unlike event-clocks, 
clocks in GTA are not necessarily associated with events. We also consider a 
generic syntax that allows for diagonal constraints between variables. 
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2. We show undecidability of reachability for GTA, and study a safe subclass 
that makes the model decidable. Safe GTA already subsume timed automata, 
event-clock automata (with diagonal constraints) and automata with timers. 

3. We adapt state-of-the-art symbolic enumeration techniques from timed 
automata literature to safe GTA. While we make use of ideas presented in [22] 
and [3], these works do not contain diagonal constraints between variables. 
Our main technical and theoretical innovation lies in a new termination anal- 
ysis of the symbolic enumeration in the presence of diagonal constraints. Sur- 
prisingly, we show that the enumeration terminates as long as the diagonal 
constraints are restricted to usual clocks and event-clocks, but not timers. 

4. We develop a prototype implementation of our model and algorithm in 
TCHECKER, an open-source platform for timed automata analysis, and show 
promising results on several existing and new benchmarks. To the best of 
our knowledge, our tool is the first that can handle event-clock automata, a 
model that till date has been the subject of many theoretical results. 


Related Works. In the work that first introduced ECA, a translation from ECA 
to a timed automaton was also proposed. However, this translation is not effi- 
cient: in the worst case, this translation incurs a blowup in the number of clocks 
and states. In [27,28], an extrapolation approach using maximal constants has 
been studied for ECA. However, it has been observed that simulation-based tech- 
niques are both more effective [14,16] and efficient [5,24—26] than extrapolation 
for checking reachability. Recently, [3] proposed a zone-based reachability algo- 
rithm for diagonal-free ECA, using simulations for finiteness, but there was no 
accompanying implementation. Diagonal constraints have long been known to 
allow succinct modeling [15] for the class of timed-automata, but only recently a 
zone-based algorithm that directly works on such automata, was proposed. ECA 
with diagonals are more expressive than ECA [19]. In this work, we propose a 
zone-based algorithm for a unified model that subsumes ECA with diagonals. 

The use of history clocks and prophecy clocks in ECAs is in the same spirit 
as past and future modalities in temporal logics - this makes ECAs an attractive 
model for writing timed specifications. Indeed, this has also led to a develop- 
ment of various temporal logics with event-clocks [1,23,36]. ECA with diagonal 
constraints have been well-studied, such as in the context of timeline based 
planning [19,20]. Finally, while there has been substantial advances in the the- 
ory of ECA, to the best of our knowledge, the only tool that handles ECA is 
TEMPO [37], and even this tool is restricted to just history clocks. 


Structure of the Paper. In Sect.2 we start by defining the generalized model. 
Section 3 examines its expressiveness, while Sect. 4 deals with the reachability 
problem and the safe subclass. Section 5 develops the symbolic enumeration tech- 
nique, while Sect. 6 explains how distance graphs can be extended to this setting. 
Section 7 is dedicated to finiteness. Finally, we provide our experimental results 
in Sect.8 and conclude with Sect.9. All the missing proofs can be found in the 
full version of the paper [2]. 
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2 Generalized timed automata 


In this section we introduce the unified model. While we build on classical ideas 
from timed automata, almost every aspect is extended and below we highlight 
these changes. We define X = XyW Xf to be a finite set of real-valued variables 
called clocks, where Xy is the set of history clocks, and Xr is the set of future 
clocks. History clocks always have a non-negative value and can increase arbi- 
trarily along with time. Future clocks always have a non-positive value and can 
only increase until their values hit 0. History clocks simulate the usual clocks 
in timed automata and recording clocks of event-clock automata (ECA), and 
future clocks simulate timers and prophecy clocks of ECA. Both these clocks 
can take a special “undefined value” which marks that they are inactive. To deal 
with this naturally, we consider an extension of the reals with +oo and —oo as 
in [3]. The difference here is that we also have the so-called diagonal constraints. 


Extending Clock Constraints. Let R = RU {—00, +00} denote the set of all 
real numbers along with —oo and +oo. The usual < order on reals is extended to 
deal with {—oo, +00} as: —oo < c < +00 for all c € R and —o < ov. Similarly, 
Z = ZU {-co, +00} denotes the set of all integers along with —oo and +oo. 
Let R>o (resp. R<o) be the set of non-negative (resp. non-positive) reals. Let 
C= {(<,c) |c € R and <€ {<, <}}, called the set of weights. 

Let X U {0} be the set obtained by extending the clocks of GTA with the 
special constant clock 0. Note that this clock will always have the value 0. Let 
(X) denote a set of clock constraints generated by the following grammar: 
pu=a2-—ysc| pAg where z,y E€ X U {0}, (<,c) € C and c € Z. The 
introduction of the special constant clock 0 allows us to treat constraints with 
just a single clock as special cases: the constraint x < c is equivalent to x — 0 < c 
and the constraint c < x is equivalent to 0 — x < —c. We often write x = c 
as a shorthand for x < c^ c < x. Constraints of the form x — y < c will be 
called atomic constraints. A constraint of the form x — y < c is a diagonal (resp. 
non-diagonal) constraint if x, y #0 (resp. x = 0 or y = 0). 

To evaluate the constraints allowed by ®(X), we extend addition on real 
numbers with the convention that (+00) + a = a + (+00) = +00 for all a € R 
and (—oo) + 8 = B+ (—co) = —oœ, as long as 8 4 +00. We also extend the 
unary minus operation from real numbers to R by setting —(+00) = —oo and 
—(—oo) = +00. Abusing notation, we write 8 —a for 6+(—a). Notice that with 
this extended addition, the minus operation does not distribute over addition!. 


ow 


Extending Valuations. A valuation of clocks is a function v: X U {0} > ] 
which maps the special clock 0 to 0, history clocks to R>o U {+00} and futur 
clocks to R<o U{—oo}. We denote by V(X) or simply by V the set of valuation 
over X. We say that clock x is defined (resp. undefined) in v when v(x) € 
(resp. u(x) E€ {—00, +00}). Let x,y € X U {0} be clocks (including 0) and let 
(<,c) be a weight. For valuations v € V, define v E y — z < c as u(y) — v(x) <c. 


mn 


Zs 


1 Notice that —(a + b) = (—a) + (—b) when a or b is finite or when a = b. But, when 
a = +00 and b = —co then —(a + b) = —œ whereas (—a) + (—b) = +00. 
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We say that a valuation v satisfies a constraint y in (X), denoted as v E y, 
when v satisfies all atomic constraints in 4. 

By definition, we easily check that the constraint y — x < c is equivalent to 
true (resp. false) when (<,c) = (<,+00) (resp. (<,c) = (<,—00)). Constraints 
that are equivalent to true or false will be called trivial, whereas all others are 
non-trivial constraints. If (<, c) Æ (<, +00) then v } y— z < c never holds when 
v(x) = —oo. Also, if u(x) = u(y) € {—0°0, +00} then v E y — z < c only holds for 
(<, c) = (<, +00). For a non-trivial constraint y — x < c, we have 


- v H y- z 4c iff u(y) < +00 = v(x) or (v(x) is finite and u(y) < v(x) +c). 
v = y- zr < —o iff u(y) < +00 = v(x) or v(y) = —œ < v(x). 


= y — x < +00 iff v(x) Æ —oo and v(y) # +00. 


We abuse notation and for Y C X, we define Y <c as Myer y<c, and Y =c 
as Nyey Y = c. We denote by v + ô the valuation obtained from valuation v 
by increasing by 6 € R>o the value of all clocks in X. Note that, from a given 
valuation, not all time elapse result in valuations since future clocks need to stay 
at most 0. For example, from a valuation with v(x) = —3 and u(y) = —2, where 
x,y are future clocks, one can elapse at most 2 time units. 


Extending Resets. For history clocks, the reset operation sets the clock to 0. 
For future clocks, the reset operation says that all constraints on the clock must 
be discarded, i.e., the clock is released. Given that the set of clocks is partitioned 
into history clocks and future clocks, we use the same notation [R]v to talk about 
the change of clocks in R, whether it be reset/release. Formally, given a set of 
clocks R C X, we define [R]v as {v € V | v'(x) = 0 Y xz € RO Xy and v' (x) = 
v(x) V x R}. Observe that the release operation is implicit: each future clock in 
R could take any value (not necessarily the same) from [—co, 0] in [R]v. Note that 
[R]v is a singleton when R contains only history clocks - this corresponds exactly 
to the reset operation in timed automata. Then, we simply write v’ = [R]v 
instead of {v’} = [R]v. When R contains only future clocks, [R]v is the set 
of valuations obtained by releasing each clock in R while keeping the value of 
all other clocks unchanged. For W C YV, we let [R]W = U,cw|R]v. We have 
[R U RW = [R] ([R"]W). 


Extending Guards and Transitions. Before we define GTA, let us focus 
on the language to specify transitions. In normal timed automata, as shown in 
Fig. 2, a transition reads a letter, checks a guard g € (Xp) and then resets a 
subset R of (history) clocks. But in any one transition only a pair of guard, reset 
is performed and one cannot interleave them. 


a,g,R a, prog 


Fig. 2. A transition of TA (left) and of a GTA (right) 
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We generalize this to our setting with history and future clocks but also to 
allow arbitrary interleaving of guards and changes (to model this with a TA one 
may use a sequence of multiple transitions without delays in-between.) Formally, 
an instantaneous timed program is generated by the following grammar: 


prog := guard | change | prog; prog 


where guard = g € ®(X) and change = [R] for some R C X. While guard and 
change are atomic programs, prog; prog refers to sequential composition. The 
set of all programs generated by the above grammar will be denoted Programs. 
Then on a transition, we simply have a pair of letter label and an instantaneous 
timed program, e.g., (a, prog) in Fig.2 (right). 

The semantics for programs on a transition must generalize semantics for 
guards (defined using satisfaction relation = above) and resets/release (defined 
using [R] above). But there is an obvious difference between these two: a guard 
may be crossed only if the valuation before the guard satisfies it, whereas a 
change (reset or release) defines a relation between the valuations before and 
after the change. To capture both in a uniform way, we define the semantics 
of programs as relations on pairs of valuations. Formally, for v,v’ € V, prog € 


Programs we define (v,v’) prog, more conveniently written as v 5, v’, 
inductively: 
~vSv' if v g and v =v, 

[R] 


- v = v' if v' € [R]v, 
prog, ;progy a prog prog 
— y =R, y if du” € V such that v ——> v" and v” — v. 


Now, we have all the pieces necessary to define our generalized model. 


Definition 1 (Generalized timed automata). A generalized timed 
automata A is given by a tuple (Q, X, X, A, (qo, g0), (Qf, gf)), where Q is a 
finite set of states, X is a finite alphabet of actions, X = XpW Xy is a set of 
clocks partitioned into future and history clocks, the initialization condition is a 
pair comprising of an initial state qo E€ Q and an initial guard go E B(X) which 
should be satisfied by initial valuations, similarly, the final condition is a pair 
comprising of a set of final states Qs C Q along with a final guard gs that must 
be satisfied by final valuations, and A C (Q x X x Programs x Q) is a finite set 
of transitions. A contains transitions of the form (q,a, prog,q'), where q is the 
source state, q' is the target state, a is the action triggering the transition, and 
prog is the instantaneous timed program that is executed in sequence (from left 
to right) while firing the transition. 


The semantics of a GTA A = (Q, X, X, A, (qo, go), (Qf, gf)) is given by a tran- 
sition system TS4 whose states are configurations (q, v) of A, where q € Q and 
v € V is a valuation. A configuration (q, v) is initial if q = qo and v = go. A 
configuration (q, v) is accepting if q E€ Qf and v — gy. Transitions of TS, are 


of two forms: (1) delay transition: (q, v) = (q,v + ô) if (v + ô) | Xp < 0, and 
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(2) discrete transition: (q, v) en (q’,v’) if t = (q,a, prog, q’) € A and v P ul. 


Thus, a discrete transition t = (q, a, prog, q’), where prog = prog);...; prog, can 
be taken from (q, v) if there are valuations v1, ..., Un such that v PBL gy Eea 
Prog, 


-—— > vn = v. A run of a GTA is a finite sequence of transitions from an 
initial configuration of TS4. A run is said to be accepting if its last configuration 
is accepting. 


3 Expressivity of GTA and Examples 


The GTA model defined above is rather expressive. Figure 3 illustrates an exam- 
ple which accepts words of the form a"b™ with m < n, where each a occurs 
at time 0, after which b’s are seen one by one, with distance 1 between them. 
The history clock x is used to ensure the timing constraint. For every a that is 
read, the future clocks y, z decrease by 1. Hence the future clocks y, z maintain 
the opposite of the number of a’s seen. When the automaton starts reading b, 
the future clocks also start elapsing time and since they cannot go above 0, the 
number of b’s is at most the number of a’s. Such a language cannot be accepted 
by timed automata since the untimed language obtained by removing the time 
stamps needs to be regular in the case of timed automata. The GTA model is 
not only expressive, it is also convenient for use. To see this we now show that 
three classical models of timed systems can be easily captured using GTA. We 
also illustrate the modeling convenience provided by GTA in Sect.8 based on 
experiments. 


a, pro b, prog: 
Pi Ne History clocks: {x}, Future clocks: {y, z} 


e b, prog, N prog, : (x =0; [y]; y =z- 1; [e];z=y) prog, : (x= 1; [2]) 
(2) Initial condition: y =z =0 Final condition: true 


Fig. 3. Example of a GTA 


Timed automata. Timed automata (TA) of Alur-Dill [9] can be modeled as a 
GTA as follows: (1) The set of states of the GTA is the same as the set of states 
of the TA. (2) There are no future clocks in the GTA and its history clocks are 
the clocks of the TA. (3) Each transition of the form q an, q' ina TA, where 
g is a guard, a a letter and R a subset of clocks to be reset, is replaced by a 
transition q =$, q’ where prog = (g; [R]). (4) Initially, all clocks must be 0, 
captured by setting go = (Xy = 0). (5) The final guard is empty: gf = True. 

Event-clock Automata. Event-clock automata (ECA) of [10] can be modeled 
as a GTA as follows: (1) The set of states of the GTA is the same as the set 
of states of the ECA. (2) For each a € X, the GTA has a history clock “@ and 
a future clock @. (3) Each transition of the form q “5, q! in a ECA, where 
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g is a guard of the ECA, a a letter, is replaced by a transition q ag 
where prog := ((@ = 0);[@];9;[@]). (4) At initialization, history clocks must 
be undefined (set to oo), captured by go = (XH = ov). (5) At acceptance, all 


future clocks must be undefined, i.e., gf = (Xr = —0o). 


Automata with Timers. The third model we consider is that of automata 
with timers. Timers are timing constructs that are started/initialized with a 
certain time value at some point/event and count down to 0. They measure 
the time from when they were started till the timer hits 0, where the event of 
hitting 0 is called timeout. However, they can be stopped using a stop event at 
any intermediate point instead and in which case the timer must be freed for 
reuse later. Timers are a common construct in protocol specification, e.g., the 
ITU standard which uses timers rather than clocks [30] and Mealy machines 
with timers [31]. 

In our setting, a timer can be seen as a specific instance of a future clock. 
More precisely Automata with timers (Ay) can be modeled as GTA as follows: 
(1) The set of states of the GTA is the same as the set of states of Ay. (2) 
The future clocks of GTA are the timers of Ay and there are no history clocks. 
Initially, the timers are undefined, captured by go = (Xp = —oo) and gy = True. 
(4) A transition of Ay with action a from q to q’ is encoded as q Z595, q/ with: 


— if the transition starts timer x with value c € Rso, then prog = (x = 
—oo; [z]; £ = —c). 

— if the transition is guarded by timeout(«), then prog = (x = 0; [x]; x = —oo). 

— if the transition stops timer x, then prog = ([z]; x = —oo). 


We note that the timer above differs from a prophecy-event-clock (of ECA) 
though both are future clocks. Prophecy-clocks are released only when the event 
is seen, so at that point the value of the prophecy-clock must be 0. On the other 
hand timers can be stopped and released even when their value is not 0. This 
subtle difference has a surprising impact when we allow diagonal guards. 


4 The Reachability Problem for GTA 


We are interested in the reachability problem for GTA: given a GTA A, does 
it have an accepting run? For normal TA, the reachability problem is decid- 
able and PSPACE complete as shown in [9]. This was shown using the so-called 
region abstraction, by proving the existence of a finite time-abstract bisimu- 
lation. However, this is not the case for GTA. As explained in the previous 
subsection, GTA capture ECA, and as shown in [27,28], there exists ECA for 
which there is no finite time-abstract bisimulation. However, reachability is still 
decidable in the specific case of ECA, as again shown in [10]. We note that for 
ECA model of [27,28] there are no diagonal constraints. In this case they show 
decidability via zone-extrapolation. In [3], another approach for decidability via 
zone simulations is shown. But again even in this model diagonal constraints 
are disallowed. Even more critically in GTA, we can capture timers and a priori 
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we can have diagonal constraints even among timers. So, the question we ask is 
whether reachability is still decidable for GTA. Surprisingly, the answer is no. 
The intuition is that with future clocks and diagonal constraints, we get the 
ability to count (cf. Fig. 3). 


Theorem 2. Reachability for GTA is undecidable. 


Proof. We reduce from counter machines. Given a counter machine, we will build 
a GTA with one future clock yc for each counter C and one extra future clock 
z. The reduction uses diagonal constraints between z and the future clocks yc. 

Initially and after each transition, the value of the future clock z will be 0. 
Since a future clock has to be non-positive, time elapse is impossible. As an 
invariant, the value of the future clock yc is the opposite of the value of counter 
C. The operations on counter C are encoded with the following programs: (1) 
zeroc = (yc = 0) (2) inco = ([];z = yo = 1; [ychiyc = z;[z];z = 0) (3) 
deco = (yc < —1; [z]; z = yo +1; [yc]; yc = z; [z]; z = 0). In programs incc and 
decc, each release of a future clock is followed by a constraint which restricts the 
value non-deterministically chosen during the release. For instance, [z]; z = yc—1 
is equivalent to z := yc — 1. Hence, the overall effect of ince is yo := yo — 1, 
maintaining all other clocks unchanged, including the invariant z = 0. 


Given this negative result, what can we do? A careful observation of the 
proof tells us that it is the interplay between diagonal constraints and arbitrary 
releases of future clocks that leads to undecidability. More precisely, the encoding 
depends on the fact that clocks z and yc which are used in diagonal constraints 
(z = yc — 1, z = yc + l and yc = z) may have arbitrary values when they are 
released. This suggests a restricted subclass that we formalize next. 


Definition 3 (Safe GTA). Let Xp C Xp be a subset of future clocks. 
A program prog = (g1; [R1]; g2; [R2]; - - - ; gr; [Re]; ge+1) is Xp-safe if 

- diagonal constraints between future clocks are restricted to clocks in Xp: if 
x—y<c with x,y E€ Xr occurs in some gi then x,y € Xp; 

- clocks in Xp should be 0 or —oo before being released: if x E€ Xp N Ri then 
x =Q or x = —œ occurs in gi. 


A GTA A is Xp-safe if it only uses Xp-safe programs on its transitions and the 
initial guard go sets each history clock to either 0 or co. 


Observe that the three examples discussed in Sect.3 are safe. Timed 
automata do not have future clocks so the condition is vacuously true. In ECA, 
event-predicting clocks are always checked for 0 before being released, hence 
they are safe as well with Xp = Xp. Automata with timers without diagonal 
constraints are also trivially safe with Xp = Ø. The importance of safety is the 
following theorem which is the center-piece of this article. 


Theorem 4. Reachability for Xp-safe GTA is decidable. 


We will establish this theorem by showing a finite, sound and complete zone 
based reachability algorithm for Xp-safe GTA. If the given GTA is not Xp- 
safe, then we lose proof of termination (unsurprisingly, since the problem is 
undecidable), but we still maintain soundness. Thus, even for such GTA when 
our algorithm does terminate it will give the correct answer. 
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5 Symbolic Enumeration 


We adapt the G-simulation framework presented in [26] for timed automata with 
diagonal constraints to GTA. Diagonal constraints offer succinct modeling [15], 
but are quite challenging to handle efficiently in zone-based algorithms, and 
have led to pitfalls in the past: [14] showed that the erstwhile algorithm based 
on zone-extrapolations that was implemented in tools is incorrect for models 
with diagonal constraints; moreover no extrapolation based method can work 
for automata with diagonal constraints. The simulation framework by-passes this 
impossibility result and is the state-of-the-art for timed automata with diagonal 
constraints. The framework was extended to event-clock automata without diag- 
onal constraints in [3]. We show that the ideas from [26] and [3] can be suitably 
combined to give an effective procedure for safe GTAs. This extension to GTAs 
enables us to understand the mechanics of diagonal constraints in future clocks. 
The algorithm based on the G-simulation framework involves: 


1. computation of a set of constraints at every state of the automaton by a static 
analysis of the model, 

2. a symbolic enumeration using zones to compute the zone graph, 

3. a simulation relation between zones to ensure termination of the enumeration. 


We will next adapt the static analysis to the GTA setting. The algorithm for 
the zone graph computation and the implementation of the simulation relation 
over zones is taken off-the-shelf from [26] and [3], except for a minor adaptation 
to include diagonal constraints involving future clocks. What is absent, and 
requires a non-trivial analysis, is the proof of termination. Therefore, we will 
mainly focus on this aspect and devote Sect. 7 for the termination argument. 


G-Simulation and the Static Analysis for GTA. We fix a GTA A = 
(Q, X, X,T, (qo, go), (Qf, gf)) for this section. Our goal is to define a simulation 
relation on the semantics of <A, i.e., on TS(A). In the subsequent sections we will 
lift this to zones and show its finiteness. A simulation relation on TS(A) is a 
reflexive, transitive relation (q,v) < (q, v’) relating configurations with the same 


control state and (1) for every (q, v) 4 (q, v + ô), we have (q, v’) 4 (q, v + 6) 
and (q,u +4) < (q,v’ +ô), (2) for every transition t, if (q, v) = (q1, v1) for some 


valuation v1, then (q, v’) + (q1, v1) for some valuation v} with (q1, v1) < (q1, v1). 
For any set G of atomic constraints, we define a preorder <q on valuations: 


v sgv if Vp € G, Vô > 0, vtd Ey = vt+bkK eg. 


Notice that in the definition above, we do not restrict 6 to those such that v+6 is 
a valuation: we may have v(x)+ô > 0 for some x € Xr. In usual timed automata, 
this question does not arise, as elapsing any 6 from any given valuation always 
results in a valuation. But this is crucial for the proof of Theorem 5 below. 
Intuitively, the preorder above is a simulation wrt the constraints in G even 
after time elapse. But we need this to also be a simulation wrt discrete transi- 
tions. To achieve this, the set of constraints G should depend on the available 
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discrete transitions. In fact, we define a map G from states to set of constraints, 
in such a way that it captures the simulation wrt the discrete actions. In other 
words, our focus will be to choose state-dependent sets of constraints (given by 
the map G) depending on A such that the resulting preorder induces a simulation 
on TS(A). 

As a first step towards this, we define, for any set G of constraints and any 
program prog, a set of constraints G” = pre(prog,G) such that, if v <q v 
and v 2S; uv, then there exists v! $, vi such that vı <a vi. This set 
is defined inductively as follows (G is a set of atomic constraints, R is a set 
of clocks, g is an arbitrary constraint, y — x < c is an atomic constraint): 


pre(prog,; prog, G) = pre(prog,, pre(prog,, G)) {y—a<c} ifay¢R 
pre(g, G) = split(g) UG pre([R], {y — 2 <c}) = A i r 

R),G) = Ri, SEACE EREU 

pre([R], G) Yet ], {¢}) í aes 


where split(g) is the set of atomic constraints occurring in g. 

Now, the choice of suitable G will be obtained by static analysis, on the lines 
of what was done for timed automata with diagonals [24-26], but adapted to 
our more powerful model. More precisely, we define the map G from Q to sets 
of atomic constraints as the least fixpoint of the set of equations: 


Gq) ={r<0|xrE€Xr}u [J _ pre(prog,G(q’)) (1) 
q a,prog g 
Finally, based on <g and the G(q) computation, we can define a preorder 
x. between configurations of TS(A) as (q,v) <a (q',v’) if q = g and v X¢(q) v’. 
We then show that <4 defined above is indeed a simulation relation. 


Theorem 5. The relation <4 is a simulation on the transition system TS. 


Zones for GTA and the Zone Graph Computation. Roughly, zones [12] 
are sets of valuations that can be represented efficiently using constraints 
between differences of clocks. In this section, we introduce an analogous notion 
for generalized timed automata. We consider GTA zones, or simply zones, which 
are special sets of valuations of GTA. A GTA zone is a set of valuations satisfy- 
ing a conjunction of constraints of the form y— z < c, where z, y E€ XU{0}, cE Z 
and < € {<, <}. Thus zones are an abstract representation of sets of valuations. 
Then, an abstract configuration, also called a node, is a pair consisting of a state 
and a zone. Firing a transition t := (q,a, prog, q’) ina GTA A from node (q, Z) 
will result in another node following a sequence of operations that we now define. 
GTA Zone Operations. Let g be a guard, RC X a set of clocks and Z a GTA 
zone. 


— Guard intersection: Z N g := {v | v € Z and v H g} 
— Release/Reset: [R]Z = U,<z[R]v (as defined in Sect. 2) 


= Time elapse: Z = {v + ô | v € Z,ô € R>o s.t. v +ô H (Xp < 0)} 
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Successor Computation. We can show that starting from a zone Z, the successors 
after the above operations are also zones (see Theorem 29 in [2]). A guard g can 
be seen as yet another zone and hence guard intersection is just an intersection 
operation between two zones. Similarly, the change operation preserves zones. 
Finally, as is usual with timed automata, zones are closed under the time elapse 
operation. 

Thus, for a transition t := (q,a, prog, gq’) and a node (q, Z), we can define 


the successor node (q’, Z’), and we write (q, Z) 4 (q', Z"), where Z’ is the zone 


computed by the following sequence of operations: Let prog = prog,;...; prog,,, 

where each prog; is an atomic program, i.e., a guard or a change. Then we 
st 

define zones Z1,...,Zn41 where, Zi = Z, Z’ = Zyn41, and for each 1 <i < n, 


Zi+1 = ZiNgi if prog; is a guard g;, and Z;41 = [Ri] Z; if prog, is a change [Rj]. 
Now, we can lift zone graphs, simulations from TA to GTA and obtain a 
symbolic reachability algorithm for GTA. 


Definition 6 (GTA zone graph). Given a GTA A, its GTA zone graph, 
denoted GZG(.A), is defined as follows: Nodes are of the form (q,Z) where q is 


a state and Z is a GTA zone. The initial node is (qo, Zo) where qo is the initial 
state and Zo is the set of all valuations which satisfy the initial constraint go: 
Zo is given by go \ (Xr < 0) A (Xp > 0). For every node (q, Z) and every 
transition t := (q,a, prog,q’) of A, there is a transition (q, Z) 4 (q',Z') in the 
GTA zone graph. A node (q, Z) is accepting if q E Qs and ZN gy is non-empty, 
i.e., there exists a valuation in Z satisfying the final constraint. 


Similar to the case of zone graphs for timed automata and event zone graphs 
for ECA, the GTA zone graph can be used to decide reachability for generalized 
timed automata. A node (q, Z) is said to be reachable (in A) if there is a path 
from the initial node (qo, Zo) to (q, Z) in GZG(A). Thus, reachability of a final 
state in A reduces to checking reachability of an accepting node in GZG(A). 
However, as in the case of zone graphs for timed automata, GZG(A) is also not 
guaranteed to be finite. Hence, we need to compute a finite truncation of the 
GTA zone graph, which is still sound and complete for reachability. 


Definition 7 (Simulation on GTA zones and finiteness). Let < be a sim- 
ulation relation on TS(A). For two GTA zones Z,Z', we say (q, Z) < (q, Z’) if 
for every v € Z there exists v' € Z' such that (q,v) < (q,v'). The simulation < 
is said to be finite if for every sequence (q, Z1), (q, Z2),... of reachable nodes, 
there exists j >i such that (q,Z;) < (q, Zi). 


Now, the reachability algorithm, as in TA, enumerates the nodes of the GTA 
zone graph and uses the simulation <4 from Theorem 5 to truncate nodes that 
are smaller with respect to the simulation. In Sect.7, we will show that <4 is 
finite when A is safe, which implies that the reachability algorithm terminates. 
But before that we discuss the issue of implementability. 
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6 Computing with GTA Zones Using Distance Graphs 


To implement the reachability algorithm described above, we will view zones as 
distance graphs, as is usually done in the literature [12]. 

Recall the notion of weights C = {(<,c) | c € R and < € {<, <}. An order 
relation < between weights is defined as (<,c) < (<’,c’) when either (1) c < c, 
or (2) c=’ and < is < while < is <. Note that since (<,—co) < (<,—c) < (< 
,C) < (<,00) < (<, 00) for all c € R, this relation is a total order and therefore 
min of a finite set of weights is well defined. We also use the commutative and 
associative sum operation on weights defined in [4]. If c,c’ € R are finite, the 
definition is as usual: (<,c) + (<, c) = (<4,c + ¢) where <4” = <ifa=d=< 
and <4” = < otherwise. Infinite weights a, 6 from the list (<, +00), (<, —co),(< 
, +00), (<, —oo) are all ‘absorbants’ wrt. weaker weights: a+ 6 = G+a=aifa 
is stronger than £ (i.e., a is listed after 3). Also, a+ (<,c) = a if c € R is finite. 

A distance graph G is a weighted directed graph without self-loops, with 
vertex set X U{0} = XpUX 7 U{0}, and edges labeled with weights from C \{(< 
, —00)}. We define its semantics [G] := {v € V | v H y—2 < c for all edges z <5 
y in G}. The weight of edge x > y is denoted Gry and we set Gry = (<,0o) if 
there is no edge x — y. The weight of a path is the sum of the weights of its 
edges. A cycle in G is said to be negative if its weight is strictly less than (<, 0). 

In classical timed automata, the significance of distance graphs stems from 
the observation that a distance graph has no negative cycles iff its semantics is 
non-empty. This property does not immediately hold for distance graphs over 
the extended algebra [4, Section 4.2] However, we can convert a distance graph 
G (in time polynomial in number of clocks) into a standard form where this 
characterization continues to hold. First, we set Go, = min(Goz, (<,0)) for x € 
Xp and Gi, = min(Gzo,(<,0)) for x € Xyp. Moreover, if £ € Xp then we 
set Gig = min(Gzo, (<,00)) if Gry # (<,oo) for some y ¥ x, otherwise we 
keep Gio = Geo. Similarly, if y € Xy then we set Gp, = min(Goy, (<, 00)) 
if Gry # (<,00) for some x # y, otherwise we keep Go, = Goy. Finally, for 
x,y € X with z # y we set Griy = Gzy. The graph G’ constructed above is 
called the standardization of G, it is equivalent to G (i.e., [G’] = [G]) and it 
has a negative cycle iff its semantics [G] is empty [4]. 

Now, suppose G’ (in standard form) has no negative cycles, then we construct 
G” by replacing the weight of an edge x > y by the minimum of the weights of 
the paths from z to y in G’. Such a G” is called the normalization of G’ and has 
several useful properties. 

Let Z be a nonempty zone. Writing the constraints in Z as a distance graph, 
followed by standardizing and normalizing it, results in its canonical distance 
graph G(Z): [G(Z)] = Z and G(Z) is minimal among the standard graphs G 
with [G] = Z. We denote by Zz, the weight of the edge x > y in G(Z). 

[3] contains the algorithms for the zone operations when there are no diagonal 
constraints. Successor computation can be done in O(|X|? - |g|) and the simula- 
tion in O(|X|?). Incorporating intersection with diagonal constraints requires an 
additional standardization step since diagonal constraints may break this prop- 
erty. A detailed explanation of the successor computation of zones is provided in 
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[2]. For the simulation, the algorithm from [26] is used. However, in the presence 
of diagonal constraints, the simulation check becomes NP-complete in general, 
and makes use of heuristics that allows for a faster check in practice. What 
remains is to show that <4 is a finite simulation for Xp-safe GTA. 


7 Finiteness of the Simulation Relation 


In this section, we show that the simulation relation <4 proposed in Sect. 5 is 
finite for safe GTA, which proves termination of the symbolic enumeration-based 
reachability algorithm. We do this in two parts: first, we show that the zones that 
are reached during the enumeration satisfy some invariants, in particular, only 
finitely many values occur in constraints among future clocks. This is however 
not necessarily true for history clocks. There the simulation comes into play. 
In the second part of the proof, we combine the invariants with an equivalence 
relation to show finiteness of the simulation. Below, we sketch these arguments 
and provide intuition leaving formal details to [2] due to lack of space. 
Throughout this section, we fix an Xp-safe GTA A. Let M = max{|c| | c € 
Z is used in some constraint of A}, called the maximal constant of A. We say 
that a zone Z is reachable if there is some reachable node (q, Z) in GZG(A). 


Part 1: Invariants on zones. We start by showing an important property 
of reachable zones: closure under valuations that agree on the value of history 
clocks, and satisfy the same set of safe constraints involving non-history clocks. 

We say that a constraint x — y < c is M-bounded if either c € R is such that 
\c| < M or c € {—00;+ 00}. It is Xp-safe if x,y € Xp implies x,y E€ Xp. We say 
that it is (Xp, M)-safe if it is both M-bounded and X p-safe. 


Lemma 8. Let v,v' € V be such that v'ly,, =VLx,, and, for all (Xp, M)-safe 
constraints y — x <c with x,y E€ Xr U {0}, we have v' = y — z <c if and only if 
v H y—rzra<c. Let Z be a reachable zone. Then, v € Z if and only if v' € Z. 


The proof (given in [2]) works by establishing that the property is true in the 
initial zone, and showing that it is invariant under the zone operations used to 
compute GZG(A). This proof crucially uses the fact that A is Xp-safe. For the 
case of releasing a clock x € Xp \ Xp, we use the fact that a diagonal constraint 
involving x may not use another future clock. For the case of releasing a clock 
x € Xp, we use the fact that the value of the clock must be 0 or —oo just before 
the release. As a non-example, consider Fig.3. Here, Xp = {y,z} and M = 1. 
After two iterations of a, the zone Z2 reached is x = OA y = z = —2. Pick 
v: gx = 0,y = z = —2 and v : z =0,y = z = —3. Notice that both of them 
satisfy the same set of (Xp, M)-safe constraints, but v € Z2, v’ ¢ Z2. Indeed, 
the automaton is not Xp-safe since y and z are released arbitrarily. 

From Lemma 8, we get the following corollary (with a more precise statement 
and proof in [2]). Namely, if a reachable zone Z contains a valuation v in which 
the difference between two future clocks x,y (including the zero clock) is finite 
and large enough, then Z contains valuations where the difference between x 
and y is any finite and large enough value. 
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Corollary 9. Let Z be a reachable zone and let v € Z. Let n = max(1,|Xp)). 
For all x,y € Xr U {0}, if —co < v(x) — u(y) < -nM then, for every a with 
—œ <a < —nM, we have a valuation v' € Z with v'(x) — v'(y) =a. 


Notice that the property above does not hold if we simply take n = 1. For 
instance, if we have two clocks z,z € Xp then, applying the (Xp, M)-safe 
program ([x, z];z = —M Aa — z = —M) from Y results in a zone Z where all 
valuations v satisfy v(x) = —2M. So the property fails with n = 1, z and y = 0. 
This is a noteworthy difference between models with and without diagonals. 

Using Corollary 9, we can prove the main invariants satisfied by the zones 
obtained during the enumeration. Essentially, the weights of edges involving 
non-history clocks come from a finite set which depends on the number of future 
clocks in Xp and the maximum constant M of the automaton. This also induces 
an invariant on the constraint between a history clock and a future clock. 

Before stating the result, we first give two technical lemmas from [4] that we 
use extensively in the proof. 


Lemma 10 ( [4]). 


1. Let (<,c) be a weight anda € R. Then, 
-aac iff (<,a) < (4,0) if (S,0) < (<, —a) + (4,0), 
- adc iff (<,c) < (<,a) iff (<,-a) + (<,c) < (<,0) iff (<,-a) + (<,c) < 
(<, 0). 
2. Let (<,c), (<, c"), (<3, ce") be weights with (<,0) < (<,c) + (<',c). Then, there 
exists a E€ R such that a < c and —a <' c'. If in addition we have (<",c") < 
(<,c) then there exists such ana with af’ c". 


Lemma 11 ( [4]). Let G = G(Z) for a non-empty GTA zone Z, and let x,y € 
X U{0} be a pair of distinct nodes anda € R. There is a valuation v € [G] with 
u(y) — v(x) = a if and only if 


1. (<,a) < Gry and (<,—-a) < Gyz, and 

2. if x,y E X anda €E R is finite then the weights Gro, Goz, Gyo, Goy are all 
different from (<,—o0), and 

3. if x,y E X and a = —œ then Gor # (<, —00) F Gyo. 


Lemma 12. Let Z be a nonempty reachable zone. Let n = max(1,|Xp|). Then, 
the normalized distance graph G(Z) satisfies the following ({) conditions: 


ti For alla € Xr, y € Xy U{O}, if Zey is finite, then (<,0) < Zzo < (<,nM). 
to For all x € Xp, if Zox is finite, then (<, —nM) < Zox < (<,0). 

fs For all x € Xy, y € Xr, if Zoy is finite, then Zro + (<, —=nM) < Zry- 

fa For x,y E€ Xp, if Zz, is finite, then (<, —nM) < Zzy < (<, nM). 


Proof. We focus on f1, f2, leaving the more complicated cases to [2]. 


fı First, we consider the case where y = 0. So we assume that (<,0) < Zzo < 
(<, 00) is finite. Towards a contradiction, suppose that (<,nM) < Zzo < (< 
,00). Since Z is non-empty, we know that (<,0) < Zo + Zox. Then, using 
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Lemma 10, we can find a € R such that (<,a) < Zzo, (<,-a@) < Zox, and 
nM <a. Notice that a < co since Zro < (<,co). Further, using Lemma 11, 
we can get a valuation v € Z such that 0— v(x) = a. Since nM < a < œ, this 
implies —oo < v(x) < —nM. Let Zzo = (<,c). We have nM < c< ov. Using 
Corollary 9, we can get a valuation v’ € Z, such that —oo < v'(x) < —c, 
a contradiction as it violates the constraint 0 — x < c of Z. Next, assume 
that Zp, < (<,oo) for some y E€ Xyp. Since Z is normal, we have Zso < 
Zany + Zyo < (<,00) as Zay < (<,00) and Zyo < (<,0). We now conclude 
from the first case that (<,0) < Zzo < (<,nM). 


t2 We have to show that either Zo, = (<,—co) or (<,—nM) < Zos < (<,0). 
Let Zor = (<, c). Suppose (<,—00) < Zox < (<,-nM). We have =œ < c < 
—nM. As before, we can find a such that (<,a) < Zor, (<,—a) < Zzo and 
a # —oo. Then, by Lemma 11, we can find v € Z with v(x) = a. We have 
—oo < u(x) dc < —nM. Now, using Corollary 9, we can get a valuation 
v’ € Z such that c < v'(x) < —nM, which leads to a contradiction as it 
violates the constraint x — 0 < c in the zone. 


Part 2. Equivalence and Finiteness. We introduce below an equivalence 
relation ~% of finite index on valuations, depending on n = max(1, |Xp|) and 
the maximal constant M, and show that, if G is a set of atomic M-bounded 
integral constraints and if Z is a zone such that its canonical distance graph 
G(Z) satisfies (f) conditions, then the downward closure |¢Z = {v € V | du’ € 
Z with v xg v’} is a union of ~§, equivalence classes. 

First, we define ~m on a, 8 € R = RU {—co, co} by a ~m B if (a < c 4> 
b <c) for all (<,c) with < € {<, <} and c E€ {-w, ow} U {d € Z | |d| < M}. In 
particular, if a ~m 6 then (œ = —oo 8 = —oo) and (a = œ => B = œ). 

Next, for valuations v1,v2 € V, we define vı ~ir v2 by two conditions: 
v(x) “nm v2(x) and v(x) — v1(y) ~in+1)m V2(x) — v2(y) for all clocks x,y € 
X. Notice that we use (n + 1)M for differences of values. Clearly, ~}, is an 
equivalence relation of finite index on valuations. Using this, we can show that 
the zones that are reachable in a safe GTA are unions of ~},-equivalence classes. 


Lemma 13. Let G be a set of Xp-safe M-bounded integral constraints which 
contains both x <0 and 0 < x for each future clock x E€ Xp. Let Z be a zone 
with a canonical distance graph G(Z) satisfying the ({) conditions of Lemma 12. 
Let v1, v2 € V be valuations with vı ~ù v2. Then, vı E JGZ iff v2 E€ JGZ. 


Finally, from Lemmas 12 and 13, we obtain our main theorem of the section. 
Theorem 14. The simulation relation <4 is finite if A is safe. 


Proof. Let (q, Zo), (4, Z1), (q, Z2),-.. be an infinite sequence of reachable nodes 
in the zone graph of A. By Lemma 12, for all i, the distance graph G(Z;) in 
canonical form satisfies conditions (f). 

The set G(q) contains only Xp-safe and M-bounded integral constraints. Let 
G be G(q) together with the constraints x < 0 and 0 < x for each future clock 
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Table 1. Experimental results obtained by running our prototype implementation and, 
when possible, the standard reachability algorithm using G-simulation implemented in 
TCHECKER. Both implementations use a breadth-first search with simulation. For each 
model, we give the parameters in parenthesis - for ToyECA, we explain the parame- 
terization in [2], while for others, we report the number of concurrent processes. All 
experiments were run on an Ubuntu machine with an Intel-i5 7th Generation processor 
and 8 GB RAM, and timeout set to 60s. 


Sl. | Models G-Sim GTA Reach 
No 
Visited | Stored | Time | Visited Stored | Time 
nodes |nodes |in sec |nodes nodes |in sec 
1 | Dining Phi. (6) 5480 5480 4.911 | 5480 5480 6.410 
2 |FDDI (10) 10219 | 459 10.139 10219 | 459 16.797 
3 | Fischer (10) 447598 | 260998 | 29.1574 | 447598 | 260998 | 34.6517 
4 | ToyECA(10000, 4) 150049 | 49 4.22 3 3 0.0003 
5 | ToyECA(5000, 6) 315193 | 193 15.572 3 3 0.0006 
6 | ToyECA(1000, 100) TIMEOUT 3 3 0.877 
7 | ToyECA(50000, 120) | TIMEOUT 3 3 1.52 
8 | Fire-alarm-pattern(5) | __ 46 46 0.027 
9 | CSMACD-bounded(1) | __ 34 26 0.0054 
10 | CSMACD-bounded(4) | __ 4529 2068 2.597 
11 | ABP-prop1(1) _ 114 114 0.038 
12 | ABP-prop2(1) S 168 168 0.026 


x € Xp. From Lemma 13 we deduce that for all 7, |@Z; is a union of ~%r-classes. 
Since ~ù is of finite index, there are only finitely many unions of ~4,-classes. 
Therefore, we find 7 < j with |¢Z; = |@Z;, which implies Zj <a Zi. Since 


G(q) C G, this also implies Zj Xg(q) Zi- 


8 Experimental Evaluation 


We have implemented a prototype that takes as input a GTA, as given in 
Definition 1, and applies our reachability algorithm, in the open source tool 
TCHECKER [29]. To do so, we extend TCHECKER to allow clocks to be declared 
as one of normal, history, prophecy, or timer, and extend the syntax of edges 
to allow arbitrary interleaving of guards and clock changes (reset /release). Our 
tool, along with the benchmarks used in this paper, is available and can be 
downloaded from https://github.com/EQuaVe/GTAReach. We present selected 
results in Table 1, with further details in [2]. 

First, we consider timed automata models from standard benchmarks 
[21,34,39]. Despite the overhead induced by our framework (e.g., maintaining 
general programs on transitions), we are only slightly worse off wrt. running 
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time than the standard algorithm, while visiting and storing the same number 
of nodes. We illustrate this in rows 1-3 of Table 1 by providing a comparison of 
our tool with the implementation of the state-of-the-art zone-based reachability 
algorithm using G-simulation introduced in [24-26]. 

Next, we consider models belonging to the class of ECA without diagonal 
constraints. We remark that ours is the first implementation of a reachability 
algorithm that can operate on the whole class of ECA directly. We compare 
against an implementation that first translates the ECA into a timed automa- 
ton using the translation proposed in [10], and then runs the state-of-the-art 
reachability algorithm of [24-26] on this timed automaton. From rows 4-7 of 
Table 1, we observe significant improvements, both in terms of running time as 
well as number of visited nodes and stored nodes w.r.t. the standard approach. 

Finally, in Rows 8-12, we consider the unified model GTA. As already pointed 
out, model-checking an event-clock specification y over a timed automaton model 
A can be reduced to the reachability on the product of the TA A and the ECA 
representing ~g. In this spirit, our implementation allows the model to use any 
combination of normal clocks, history clocks, prophecy clocks or timers and 
moreover, permits diagonal guards between any of these clocks. To the best of 
our knowledge, no existing tool allows all these features. We emphasize this by 
the — in the G-Sim column of Table 1. 

We model simple but useful properties using event-clocks, and check these 
properties on some standard models from literature such as CSMACD [39], 
Fire-alarm [35] and Alternating-bit-protocol(ABP) [33]. Note that for the bench- 
mark Fire-alarm-pattern, the specification is modelled using an ECA with diag- 
onals. As a consequence, the product automaton that we check reachability on 
contains normal clocks and event-clocks. Here, we consider the following ECA 
specification: no three a’s occur within k time units. The negation of this prop- 
erty can be easily modeled by an ECA with two states and a transition on a with 
the diagonal constraint w — T < k, where @ is the history clock recording time 
since the previous occurrence of a, and @ is a future clock predicting the time to 
the next a occurrence. When reading an a, the quantity a-t gives the distance 
between the next and the previous occurrence. This language is used in [19] to 
observe that ECA with diagonals are more expressive than ECA. Finally, we 
remark that the model of ABP contains timers. A more detailed discussion of 
the model and specifications in these benchmarks is provided in [2]. 

In conclusion, as can be seen from the experimental results in Table 1, we are 
able to demonstrate the full power of our reachability algorithm for the unified 
model of generalized timed automata. 


9 Conclusion 


The success of timed automata verification can safely be attributed to the 
advances in the zone-based technology over the last three decades. In fact, [22], 
the precursor to the seminal works [8,9], already laid the foundations for zones 
by describing the Difference-Bounds-Matrices (DBM) data structure. Our goal 


A Unified Model for Real-Time Systems 285 


in this work has been to unify timing features defined in different timed models, 
while at the same time retain the ability to use efficient state-of-the-art algo- 
rithms for reachability. To do so, we have equipped the model with two kinds 
of clocks, history and future, and modified the transitions to contain a program 
that alternates between a guard and a change to the variables. For the algorith- 
mic part, we have adapted the G-simulation framework to this powerful model. 
The main challenge was to show finiteness of the simulation in this extended 
setting. To aid the practical use of this generic model, we have developed a pro- 
totype implementation that can answer reachability for GTA. We remark that 
decidability for GTA comes via zones, and not through regions. In fact, since we 
generalize event-clock automata, we do not have a finite region equivalence for 
GTA [28]. 

We conclude with some interesting avenues for future work. An immediate 
future work is to use generalized timed automata for model-checking timed spec- 
ifications over real-time systems. Further, the complexity and expressivity of safe 
GTA are natural intersting theoretical open questions, but we believe they are 
not obvious. Both these questions are answered in the timed automata literature 
using regions. However, we cannot have a region equivalence for our model, since 
even for the subclass of ECA, it was shown that no finite bisimulation is possible. 
In particular, it would be interesting to investigate if is possible to have a trans- 
lation from safe GTA to timed automata. Note that even if such a translation 
exists, it is likely to incur an exponential blowup since even the translation from 
ECA to TA costs an exponential. Coming to the complexity of the reachability 
problem for safe GTA, it is easy to see that our procedure runs in EXPSPACE, 
as we have shown that each reachable zone is a union of equivalence classes of 
a finite index (see Lemma 13). On the other hand, PSPACE-hardness is inher- 
ited from timed automata [6,8]. Closing the complexity gap is open. We note 
that even in timed automata, the precise complexity of the simulation based 
reachability algorithm is difficult to analyze, but its selling point is that it works 
well in practice. Finally, we would also like to investigate liveness verification for 
GTA, in particular what future clocks bring us when we consider the setting of 
w-words. 
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Abstract. Deep neural networks (DNNs) are increasingly used in 
safety-critical autonomous systems as perception components processing 
high-dimensional image data. Formal analysis of these systems is par- 
ticularly challenging due to the complexity of the perception DNNs, the 
sensors (cameras), and the environment conditions. We present a case 
study applying formal probabilistic analysis techniques to an experimen- 
tal autonomous system that guides airplanes on taxiways using a percep- 
tion DNN. We address the above challenges by replacing the camera and 
the network with a compact abstraction whose transition probabilities 
are computed from the confusion matrices measuring the performance 
of the DNN on a representative image data set. As the probabilities 
are estimated based on empirical data, and thus are subject to error, 
we also compute confidence intervals in addition to point estimates for 
these probabilities and thereby strengthen the soundness of the analysis. 
We also show how to leverage local, DNN-specific analyses as run-time 
guards to filter out mis-behaving inputs and increase the safety of the 
overall system. Our findings are applicable to other autonomous systems 
that use complex DNNs for perception. 


1 Introduction 


Complex autonomous systems, such as autonomous aircraft taxiing systems [31] 
and autonomous cars [20, 25,42], need to perceive and reason about their environ- 
ments using high-dimensional data streams (such as images) generated by rich 
sensors (such as cameras). Machine learnt components, specially deep neural 
networks (DNNs), are particularly capable of the required high-dimensional rea- 
soning and hence, are increasingly used for perception in these systems. While 
formal analysis of the safety of these systems is highly desirable due to their 
safety-critical operational settings and the error-prone nature of learned compo- 
nents, in practice this is very challenging because of the complexity of the system 
components, including the high complexity of the neural networks (which may 
have thousands or millions of parameters), the complexity of the camera capture 


© The Author(s) 2023 
C. Enea and A. Lal (Eds.): CAV 2023, LNCS 13964, pp. 289-303, 2023. 
https: //doi.org/10.1007/978-3-031-37706-8_15 


290 C. S. Păsăreanu et al. 


process, and the random and hard to characterize nature of the environment in 
which the system operates (i.e., the world itself). 

In this work, we describe a formal analysis of a closed-loop autonomous 
system that addresses the above challenges. Our case study is motivated by a 
real-world application, namely, an experimental autonomous system for guiding 
airplanes on taxiways developed by Boeing [3,14]. The key idea is to abstract 
away altogether the perception components, namely, the perception network and 
the image generator, i.e., the camera taking images of the world, and replace 
them with a probabilistic component a that maps (abstractions of) the state of 
the system to state estimates that are used in downstream decision making in 
the closed-loop system. The resulting system can then be analyzed with standard 
(probabilistic) model checkers, such as PRISM [34] or STORM [22]. 

The approach is compositional, in the sense that the probabilistic component 
is computed separately from the rest of the system. The transition probabilities in 
a are derived based on confusion matrices computed for the DNN (measured on 
representative data sets). Developers routinely use confusion matrices to evaluate 
machine learning models, so our analysis is closely aligned with existing work- 
flows, facilitating its adoption in practice. 

The size of the probabilistic abstraction qa is linear in the size of the output 
of the DNN, and is independent of the number of the DNN parameters or the 
complexity of the camera and the environment. We also describe how to leverage 
additional results obtained from analyzing the DNN in isolation to further refine 
the abstraction and also increase the safety of the closed-loop system through 
run-time guards. In particular, we leverage rules mined from the DNN model [17] 
to act as run-time guards for the closed-loop analysis, filtering out inputs that 
likely lead to invalid DNN behavior. Other methods can also be used (e.g. [17, 
18,21,26,32,35]) to catch adversarial or out-of-distribution inputs. 

The probabilities in a are estimated based on empirical data, so they are 
subject to error. We explore the use of confidence intervals in addition to point 
estimates for these probabilities and thereby strengthen the soundness of the 
analysis [5,7]. Our technique is applicable to other autonomous systems that use 
DNN-based perception from high-dimensional data. 


Related Work. Formal proofs of closed-loop safety have been obtained for 
systems with low-dimensional sensor readings [11,12,27-30,40]; however, they 
become intractable for systems that use rich sensors producing high-dimensional 
inputs such as images. 

Other works address the modeling and scalability challenges by constructing 
abstractions of the perception components [24,33]. To model different environ- 
ment conditions, these abstract models use non-deterministic transitions. The 
resulting closed-loop systems are analyzed with traditional (non-probabilistic) 
techniques. The abstractions either lack soundness proofs [33] or come with only 
probabilistic soundness guarantees [24] which do not translate into probabilistic 
guarantees over the safety of the overall system. VerifAI [16] can find counter- 
examples to system safety, but can not provide guarantees. 

The recent work in [36] aims to verify the safety of the trajectories of a 
camera-based autonomous vehicle in a given 3D-scene. The work use invariant 
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regions over the input space grouped based on the same controller action. How- 
ever, their abstraction captures only one environment condition (i.e., one scene) 
and one camera model, whereas our approach is not particular to a camera model 
and implicitly considers all the possible environment conditions. 

In contrast to previous work, we describe a formal analysis that is probabilis- 
tic, which we believe is natural since the camera images capturing the state of 
the world are subject to randomness due to the environment; further DNNs are 
learnt from data and are not guaranteed to be 100% accurate. Recent work [2] 
also discusses the use of classification metrics, such as confusion matrices, for 
quantitative system-level analysis with temporal logic specifications. However, 
the work does not discuss the computation of confidence intervals that is nec- 
essary for quantifying the empirical results. Also, it does not incorporate DNN 
specific analyses as we do here. We build on our previous work DEEPDECS [6], 
where the goal is to perform controller synthesis with safety guarantees, so the 
formalism is more involved. Furthermore, DEEPDECS does not consider con- 
fidence interval analysis, which we explore here based on some of our other 
previous works [5,7]. We analyzed center-line tracking using TaxiNet in [31]. 
That work focuses on the analysis of the network and not on the overall system. 


2 Autonomous Center-Line Tracking with TaxiNet 


Boeing is developing an experimental autonomous system for center-line tracking 
on taxiways in an airport. The system uses a neural network called TaxiNet for 
perception. TaxiNet is designed to take a picture of the taxiway as input and 
return the plane’s position with respect to the center-line on the taxiway. It 
returns two outputs; cross track error (cte), which is the distance in meters 
of the plane from the center-line and heading error (he), which is the angle in 
degrees of the plane with respect to the center-line. These outputs are fed to 
a controller which in turn manoeuvres the plane such that it remains close to 
the center of the taxiway. This forms a closed-loop system where the perception 
network continuously receives images as the plane moves on the taxiway. We use 
this system as a case study and also as a running example throughout the paper. 


System Decomposition. The decomposition of this system is illustrated in 
Fig.1. The controller sends actions a to the airplane to guide it on the taxi- 
way. The dynamics (which models the movement of the airplane on the airport 
surface) maps previous state s and action a to the next state s’.1 Information 
about the taxiway is provided by the perception network (p), i.e. TaxiNet. The 
perception network takes high-dimensional images captured with a camera (c), 
and returns its estimation Sest of the real state s. 

For our application, state s € S captures the position of the airplane on 
the surface; S is modeled as CTE x HE. The network estimates the state s := 
(cte,he) based on images taken with a camera placed on the airplane. If the 
network is ‘perfect’, then s = Sest- However, this does not hold in practice. 


1 Velocity may be provided as feedback to the controller; we ignore here for simplicity. 
? Assuming the relevant state of the system is recoverable from the input image. 
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The network is trained on a finite set of images and is not guaranteed to be 
100% accurate whereas images observed in operation show a wide variety due to 
different environment (e.g., light, weather) conditions and imperfections in the 
camera. 


Probabilistic Analysis 


Pi A Confusion TFA 
Camera (c) ait SORIN), (p) | sost Kekca Probabilistic | Sest 


l l 
I l 
[~] Controller I | Abstraction for ———) Controller | | 
e pec | 
$ | 

| | la i Z! 

> Run-time s oa 
Airplane Guard - | irp n l 
s Dynamics ni Dynamics i 

Fig. 1. Closed-loop System Fig. 2. Abstracted System 


Component Modeling. We built a simple discrete model of the airplane 
dynamics and a discrete-time controller for the system, similar to previous 
related work [4,23] which also considers discretized control. Since the controller 
is discretized, we abstract the regression outputs of TaxiNet to view the model 
as a Classifier which predicts the plane’s position in discrete states. Treatment of 
more complex systems with continuous semantics and regression models is left for 
future work. The main challenge that we address in the paper is the modeling of 
the perception components (the camera capture process and the network), which 
we describe in detail in the next section. We model the (abstracted) autonomous 
system as a Discrete Time Markov Chain (DTMC) [38]; the code for the models 
is provided in the appendix of an extended version of this paper [37]. 


Safety Properties. In our study, the goal is to provide guarantees for safe 
behavior with respect to two system-level properties indicated by our industrial 
partner. The properties specify conditions for safe operation in terms of allowed 
cte and he values for the airplane, by using taxiway dimensions. The first prop- 
erty states that the airplane shall never leave the taxiway (i.e., |cte| < 8 meters). 
The second property states that the airplane shall never turn more than a pre- 
scribed degree (i.e., |he| < 35°), as it would be difficult to maneuver the airplane 
from that position. These two properties can be encoded in PCTL [8] as follows. 


P =?|F(|cte| >8m)] (Property 1) 


P =?|[F(|he| > 35°)] (Property 2) 


Here P =? indicates that we want to calculate the probability that eventually 
(F) the system reaches an error state. 


TaxiNet DNN. This is a regression model with 24 layers including five con- 
volution layers, and three dense layers (with 100/50/10 ELU neurons) before 
the output layer. The inputs to the model are RGB color images of size 360 x 
200 pixels. We use a representative data set with 11108 images, shared by our 
industry partner. The model has a Mean Absolute Error (MAE) of 1.185 for 
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cte and 7.86 for he outputs respectively. The discrete nature of the controller in 
our DTMCs induces a discretization on TaxiNet’s outputs and the treatment of 
TaxiNet as a classifier for the purpose of our analysis. cte € [—8.0 m, 8.0 m] and 
he € [—35.0°, 35.0°] are translated into cte € {0,1,2,3,4} and he € {0,1,2} as 
shown below. 


3if —8.0m <= cte < —4.8 m 


lif — 4.8m <= cte < —1.6 m 1 if — 35.0° <= he < —11.67° 
cte = 4 Q if — 1.6m <= cte <= 1.6 m he = 4 0 if —11.67° <= he <= 11.66° 
2if 1.6m < cte <= 4.8 m 2 if 11.66° < he <= 35.0° 
4if 4.8m < cte <= 8.0m 
We use label “—1” to denote error states, i.e., cte = —1 iff |cte| > 8 m and 
he = —1 iff |he| > 35°. For simplicity, we use cte and he to denote both the 


classifier and regression outputs in other parts of the paper (with meaning clear 
from context). Note that none of the input images are labeled by the classifier as 
“—]1”, as the outputs of the network are normalized to be within the prescribed 
bounds; however, this does not preclude the system from reaching an error. 


3 Probabilistic Analysis 


In this section, we describe the methodology for abstracting and analyzing an 
autonomous system leveraging probabilistic model checking. The main idea, 
which we initially explored in [6], is to replace the composition po c of the 
camera (denoted as c) and the perception DNN (denoted as p) with a conserva- 
tive abstraction mapping each system state to every possible estimated state; the 
transition probabilities are derived empirically based on the confusion matrices 
computed for the DNN, on a representative data set. We denote this abstrac- 
tion as a: S — D(S), mapping system states to a discrete distribution over 
(estimated) system states. Figure 2 depicts the abstracted autonomous system. 

We observe that c can be viewed as a map between state s € S to a dis- 
tribution over images, denoted as D(Img), where img € Img and Img is the set 
of images. For instance, in the TaxiNet system, state s only captures the posi- 
tion of the airplane with respect to the center-line, but there are many different 
images that correspond to the same position. This is due to uncontrollable envi- 
ronmental conditions, such as temporary sensor failures or different lighting and 
weather conditions. Consequently, a single state s can map to a number of dif- 
ferent images depending on the environment, and this is modeled by considering 
c to be a probabilistic map of type S —> D(Img). Given a system state s, a(s) 
models the probability of p o c leading to a particular estimated state Sest} a 
needs to be probabilistic because c itself is probabilistic and p is not perfectly 
accurate. 

We further describe how we can leverage DNN-specific analysis to improve 
the accuracy of perception and the safety of the overall system, via the optional 
addition of run-time guards. For the verification of the closed-loop system, we 
use the PRISM model checking tool [34]. We also explore methods for analysis 
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of DTMCs with uncertain transition probabilities [5,7], to obtain probabilistic 
guarantees about the validity of our probabilistic safety proofs even though the 
abstraction probabilities are empirical estimates. 


Assumptions. Our analysis assumes that the distribution of inputs to the net- 
work remains fixed over time (i.e., it is not subject to distribution shifts). More- 
over, the data set of input images used to estimate the probabilities in a is 
assumed to be representative, i.e., constituted of independently drawn samples 
from this fixed underlying distribution of inputs. Relaxing these assumptions is 
a challenging but important task for future research. 


3.1 Probabilistic Abstractions for Perception 


We describe in detail the construction of the probabilistic abstraction a: S — 
D(S). We do not need access to the camera and only require black-box access to 
the network for constructing our abstraction. We assume S is a finite set such 
that #S = K where #5 denotes the cardinality of set S. We use a(s, Sest) to 
represent the probability associated with estimated state sest. It is defined as, 


a(s, Sest) t= Pr [p(img) = sest] (1) 
img~c(s) 

We estimate the probabilities in a by means of a confusion matrix. Let Img, C 
Img denote a representative test dataset for images corresponding to state s, i.e., 
every sample in Img, is assumed to be an independently drawn sample from c(s). 
We assume access to representative test datasets corresponding to every state 
s € S. Let Img := U,<g Img,. For any test input img € Img, let p*(img) € S be 
the label (i.e., the true underlying state) of img, which is known since Ing is a 
test dataset. For the sake of technical presentation, we assume a bijective map 
rep : S — |K] that maps every state in S to a number in [K] := {1,2,..., K}. 
We evaluate p on the test dataset Img to construct a K x K confusion matrix C 
such that, for any k, k’ € [K], the element in row k and column k’ of this matrix 
is given by the number of inputs from Img with true state rep~'(k) that the 
perception network p classifies as state rep~!(k’). 


Clk, k'] := # {img € Img | p* (img) = rep™} (k) A p(img) = rep7'(k’)} (2) 


Given the confusion matrix C, empirical estimates for the probabilities in a 
are calculated as follows, 


a(rep™t(k), rep t (k')) = LE] 


7 Dwret CLR, k”) H 


3 Our run-time guard does require white-box access. 
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TaxiNet Example. For the TaxiNet Predicted 
application, we construct two prob- Total = 11108/0 1 2 
abilistic maps, Qcte and Qne, Corre- 0 4748|2139| 148 
sponding to each of the state variables Actualf1 91/2010} 0 
cte and he, using a representative test 2 744| 211/1017 
data set with 11108 samples. Thus, 

Qcte is of type CTE — D(CTE) and dhe Table 1. Confusion Matrix for he 


is of type HE — D(HE). Table 1 illustrates the confusion matrix for he. The map- 
ping Qne is computed in a straightforward way: ne (0,0) = 4748/(4748 + 2139 4 
148) = 0.675, giving the probability of estimating correctly that the value of 
he is zero. Similarly, ane(1,0) = 91/(91 + 2010) = 0.043, giving the probabil- 
ity of estimating incorrectly that the value of he is zero instead of one. The 
corresponding DTMC code is as follows: 


[] he=0 — 0.675: (he_est’=0) + 0.304: (he_est ’=1) + 0.021: (he_est ’=2); 
[] he=1— 0.043: (he_est ’=0) + 0.957: (he_est ?=1) + 0.0: (he_est ’=2); 
[] he=2 — 0.377: (he_est’=0) + 0.107: (he_est ’=1) + 0.516: (he_est ’=2); 


A similar computation is performed for constructing &ete. The resulting code 
for the closed-loop system is shown in [37], in the appendix. 


3.2. DNN Checks as Run-Time Guards 


We use DNN-specific checks as run-time guards to improve the performance 
of the perception network and therefore the safety of the overall system. We 
hypothesize that for inputs where the checks pass, the network is more likely to 
be accurate, and therefore, the system is safer. 

For our case study, we distill logical rules from the DNN that characterize 
misbehavior in terms of intermediate neuron values and use them as run-time 
guards (as described in Sect.4). More generally, one can use any off-the-shelf 
pointwise DNN check, such as local robustness [10,15, 19,35,39,41] or confidence 
checks for well-calibrated networks [21], as run-time guards (provided that they 
are fast enough to be deployed in practice). For practical reasons (TaxiNet is a 
regression model, it contains ELU [9] activations, we do not have access to the 
training data) we can not use off-the-shelf checks here. 


Modeling DNN Checks. Let us denote the application of (one or more) DNN- 
specific checks as a function check : (Img — S) x Img — B, such that, for 
perception network p € Img — S and image img € Img, check(p, img) = true if 
p passes the checks at input img, and check(p, img) = false otherwise. 

We further assume that a system that uses DNN checks as a run-time guard 
attempts to read the camera sensor multiple (one or more) times, until the 
check passes; and aborts (or goes to a fail-safe state) if the number of consecutive 
failed checks reaches a certain threshold. This logic can be generalized to consider 
more sophisticated safe-mode operations; for instance, the system can decelerate 


* To simplify the DTMCs, we model the updates to cte and he as independent. For 
more precision, we can compute confusion matrices and a for the pair (cte, he). 
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and/or notify an operator when the threshold is reached, as this could indicate 
serious sensor failure or adverse weather conditions. 

To model the effect of the run-time check in our analysis, we can define ( as 
the probability that an image img generated by the camera c, for any state s, 
satisfies check(p, img) = true; 

B:=  Pr_[check(p, img) = true] (4) 
img~ D 
Here D is the distribution obtained by combining c(s) for all states s € S.° To 
be more precise we can define a separate s for each state s. We estimate @ using 
the representative set of images Img, 


true 


= #Img 
— #Ing 


p: (5) 


where Img "® := {img € Img | check(p, img) = true}. 

For the overall analysis of the closed-loop system, irrespective of the state s, 
we can assume that the DNN check will pass with a probability 8. Moreover, 
since the perception network only processes images that pass the DNN check, we 
construct a refined probabilistic abstraction at"¥° using conditional probability: 
rue(s sest):= Pr [p(img) = Sest|check(p, img) = true] (6) 


img~c(s) 


a 


We can estimate a*”“¢ as before, but the confusion matrix is built using only 


the images that pass the DNN check, i.e., for dataset Img” "s C Img. 


TaxiNet Example. For TaxiNet, out of 11108 inputs, 9125 inputs (i.e., 82.1%) 
pass the DNN check resulting in the following code: 


i:(0..M] init 03 
E pc=0 & i<M— 0.821: (v’=1) & (pc?=1) & (i?’=0) + 0.179: (v’?=0) & (i?=it+1); 


We model the result of applying the DNN check with variable v; v = 1 if the 
check returns true for an image and v = 0 otherwise. M is the number of allowed 
repeated sensor readings and 7 is used to count the number of failed DNN checks. 

The abstraction for state variables he (ane) and cte (Qete) is only computed 
for the inputs that pass the check (i.e., for v = 1) based on newly computed 
confusion matrices. The DTMC code for the closed-loop system with run-time 
guards is shown in [37], in the appendix. 


3.3 Confidence Analysis 


The construction of the probabilistic abstractions relies on calculating empirical 
point estimates of the required probabilities. However, these empirical estimates 
lack statistical guarantees and can be off by an arbitrary amount from the true 
probabilities. To address this concern, we experiment with using FACT [5,7] 


5 To simplify the presentation, we omit the precise mathematical formulation for D. 
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to calculate confidence intervals for the probability that the safety properties of 
the closed-loop system are satisfied. The inputs to FACT are: 1) a parametric 
DTMC m where each empirically estimated transition probability is represented 
by a parameter, 2) a PCTL formula ¢, 3) an error level 6 € (0,1) and 4) an 
observation function O mapping state s to a tuple representing the number 
of observations for each outgoing transition from s; in our case, the number of 
observations can be obtained directly from the computed confusion matrices, i.e., 
O(s) = (C[rep(s), 1],...,C[rep(s), K]). FACT synthesizes a (1 — 6)-confidence 
interval [a,b] C [0,1] for the probability that ¢ is satisfied, given the observations. 


TaxiNet Example. The following partial code illustrates the parametric ver- 
sion of the code provided in Sect. 3.1 (with the complete code for the parametric 
models provided in [37], in the appendix). The first three lines represent the 
number of observations obtained from the confusion matrix in Table 1. 


param double x 
param double y 
param double z 


4748 2139 148; 
91 2010; 
744 211 1017; 


[] he=0 — x1: (he_est ’=0) + x2:(he_est’?=1) + (1-x1-x2):(he_est ’=2); 
[] he=1 — yi: (he_est ’=0) + (1-y1):(he_est ’=1); 
[] he=2 — z1:(he_est ’=0) + z2:(he_est’=1) + (1-21-22): (he_est ’=2); 


4 Experiments 


In this section, we report on the experiments that we conducted as part of our 
probabilistic safety analysis of the center-line tracking autonomous system. 

We built two DTMC models, mı and m2, denoting the closed-loop center-line 
tracking system without and with a run-time guard, respectively. The airplane 
dynamics and the controller are identically modeled in the two DTMCs as dis- 
crete components. The code for the models (in PRISM syntax) and more details 
about the analysis are presented in [37], in the appendix. 


Mining Rules for Run-time Guards. We leverage our prior work [17], to 
extract rules of the form Pre = > Post from the DNN. Post is the condition 
|cte* — cte| > 1.0 m V |he* — he| > 5° on the regression model’s outputs and 
Pre is a condition over the neuron values in the three dense layers of TaxiNet 
(cte* and he* denote ground-truth values). The considered Post characterizes 
invalid behavior (as explained in [31]). If an input satisfies Pre, the DNN check 
is considered to have failed on that input. Pre can be evaluated efficiently during 
the forward pass of the model, making it a good run-time guard candidate. Here 
is an example of a rule for invalid behavior: 


Nı,85 <= —0.998 A No 59 <= 3.31 A Ni g4 <= —0.994 A Ni 15 > —0.999 
A Ni 21 <= 1.711 A N1,70 <= 11.088 A Nı,51 > —0.999 A Ni, 21 > —0.637 = 
|cte* — cte| > 1.0 m V |he* — he| > 5° 


N; j indicates the j‘” neuron in the it” dense layer. The conditions over neuron 
values can be checked during the forward pass of the DNN. If an input satisfies 
the conditions, it is interpreted as failing the check. If the check consecutively 


298 C. S. Pasdreanu et al. 


fails M times, the system aborts, meaning that the system stops operating and 
hands over control to a fail-safe mechanism (such as the pilot). More details on 
the rules and their deployment as run-time guards are in [37], in the appendix. 


Confusion Matrices. The confusion matrices for the classification version of 
TaxiNet, computed for the two cases (without and with run-time guard) are 
shown in [37], in the appendix. The tables can be used by developers to better 
understand the DNN performance. For instance, the results summarized in the 
confusion matrices indicate that the DNN performs best for inputs lying on the 
center-line, which can be attributed to training being done mainly using scenarios 
where the plane follows the center-line. The model appears to perform better 
when the plane is heading left, as opposed to heading right, which may be due to 
camera position. These observations can be used by developers to improve the 
model, by training on more scenarios. Note also that the model does not make 
‘blatant’ errors, mistaking inputs on the left as being on the right (of center-line) 
or vice-versa (see e.g., entries with zero observations). Formal proofs can provide 
guarantees of absence of such transitions. 
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Fig. 3. Probabilistic model checking results via PRISM 


Analysis. We analyzed mı and mz with respect to the two PCTL properties, 
P =?[F(cte = —1)] (Property 1), and P =?[F(he = —1)] (Property 2)°. The 
airplane is assumed to start from a initial position on the center-line and heading 
straight. For me, i.e. the model with a run-time guard, we also evaluate the 
probability of the TaxiNet system going to the abort state using the property, 
P =?[F(v = 0 & i = M) (Property 3), where M is the threshold for the number 
of consecutive run-time check failures. 

The probabilities of these properties being satisfied, calculated by PRISM, 
are shown in Fig. 3, where N is a constant in the DTMCs that dictates the length 
of the finite-time horizon considered for the analysis. Note that the system has an 
additional planning layer that calculates the waypoints for the airplane’s course 
on the taxiway. The system is only used for controlling the airplane movement 
between pairs of waypoints, hence a short horizon suffices. 

The confidence intervals computed with FACT are shown in Fig. 4, at differ- 
ent confidence levels (0.95 to 0.99), for N = 4. For computing the intervals, we 
ignore the transitions in the DIMCs that were not observed in our data (see [37] 
for more details). 


6 We rewrote the properties in terms of the discrete values. 
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The PRISM analysis scales well; e.g., evaluating Property 1 for model m2 
(N = 30) requires less than 0.1s on an M1 MacBook Pro, 16 GB RAM. The 
numbers are similar for other queries. However, the confidence analysis does not 
scale as well; we could not go beyond N = 4 for a timeout of two hours, with 
Property 1 hardest to check. Newer work, {PMC [13], addresses these scalability 
challenges but we found it not yet mature enough to be applied to our models. 


Discussion and Lessons Learned. The experiments demonstrate the feasibil- 
ity of our approach, which enables reasoning about a complex DNN interacting 
with conventional (discrete-time) components via a simple probabilistic abstrac- 
tion. Our analysis not only provides qualitative (i.e., an error is reachable or not) 
but also quantitative (i.e., likelihood of error) results, helping developers assess 
the risk associated with the analyzed scenario. 
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2 2 004 
Z 0.02 3 
3 3 


0.95 0.96 0.97 0.98 0.99 0.95 0.96 0.97 
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0.98 0.99 


Fig. 4. Confidence interval results via FACT 


The results highlight the benefit of the run-time guards in improving the 
safety of the overall system; see Figs. 3(a,b) for lower error probabilities and 
Figs. 4(a,b) for tighter intervals for mz. The probability of aborting is very 
small, indicating the efficacy of the fail-safe mechanism (see Figs. 3(c)). More 
importantly, since the DNN demonstrates higher accuracy on the inputs where 
the run-time check passes, the results also indicate that improved accuracy of the 
DNN translates into improved safety. The computed probabilities and confidence 
intervals can be examined by developers and regulators to ensure that system 
safety is met at required levels. If the confidence intervals are too large, they can 
be made tighter by adding more data, as guided by the confusion matrices. 

Based on our feedback (confusion matrices) our industrial partner is retrain- 
ing the perception network. As the system is in its early stages, our industrial 
partner was more interested in the trends suggested by our analysis rather than 
the exact probability results. For instance, our results indicate that safety will 
increase with a better-performing network. The partner was also interested in 
how the DNN-specific analysis contributes to the system-level analysis. A prob- 
abilistic analysis is best viewed as an “average-case” analysis rather than “worst- 
case”. Nevertheless, such analysis is still useful since it conveys whether the 
system at least behaves safely in the average-case. 
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5 Conclusion 


We demonstrated a method for the analysis of the safety of autonomous systems 
that use complex DNNs for visual perception. Our abstraction helps separate 
the concerns of DNN and conventional system development and evaluation. It 
also enables the integration of heterogeneous artifacts from DNN-specific anal- 
ysis and system-level probabilistic model checking. The approach produces not 
only qualitative results but also provides insights that can be used in quanti- 
tative safety assessment for AI/DNN-enabled systems. This is, potentially, an 
important step to fill one of the gaps of quantitative evaluation for future AI 
certification [1]. 

Future work involves experimentation with image data sets representing a 
variety of environment conditions. We also plan to refine our models, inducing 
finer partitions on the DNN, and validate them through simulations. Another 
future research direction involves the study of the composition of safety proofs 
for the system analyzed in different scenarios. Finally, we are working on compo- 
sitional analysis techniques to achieve worst-case (non-probabilistic) guarantees. 
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Abstract. There is a pressing need for learning controllers to endow 
systems with properties of safety and goal-reaching, which are crucial 
for many safety-critical systems. Reinforcement learning (RL) has been 
deployed successfully to synthesize controllers from user-defined reward 
functions encoding desired system requirements. However, it remains 
a significant challenge in synthesizing provably correct controllers with 
safety and goal-reaching requirements. To address this issue, we try to 
design a special hybrid polynomial-DNN controller which is easy to ver- 
ify without losing its expressiveness and flexibility. This paper proposes 
a novel method to synthesize such a hybrid controller based on RL, 
low-degree polynomial fitting and knowledge distillation. It also gives 
a computational approach, by building and solving a constrained opti- 
mization problem coming from verification conditions to produce barrier 
certificates and Lyapunov-like functions, which can guarantee every tra- 
jectory from the initial set of the system with the resulted controller 
satisfies the given safety and goal-reaching requirements. We evaluate 
the proposed hybrid controller synthesis method on a set of benchmark 
examples, including several high-dimensional systems. The results vali- 
date the effectiveness and applicability of our approach. 


Keywords: Formal verification - Controller synthesis - Reinforcement 
learning - Barrier certificate - Lyapunov-like function 


1 Introduction 


The design of control and decision-making software for autonomous systems is 
a key part of many industrial applications, such as unmanned aerial vehicles, 
ground vehicles and general robots, therefore it attracts continued attention in 
the last decade [7,9, 12,14]. Among many research works in this field, a highly 
© The Author(s) 2023 
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challenging problem is the controller synthesis, i.e., to build control systems 
that guarantee the safety and the reachability simultaneously. As an emergency 
approach, the machine learning method has also been developed to tackle this 
problem in recent years. Several existing techniques focus on learning a control 
policy from user-defined reward/cost functions for encoding the required prop- 
erties. A typical way is to use the framework of reinforcement learning (RL) 
which evaluates and improves the controller’s performance by interacting with 
environments and systems. Because of its strong ability to deal with nonlinear 
and/or uncertain (or indeterministic) dynamical systems of high dimensions, as 
well as the universal approximation power of the deep neural networks, the RL- 
based controller synthesis has been extensively studied, and substantial progress 
has been made by different research teams [22,23]. However, formal reasoning of 
the required properties of such DNN-controlled dynamical systems is an ardu- 
ous and challenging problem which makes the practical use of RL still limited. 
For safety/reachability verification of the system under the learned controller, 
one main approach is tracing the reachable sets of the system through comput- 
ing [8, 13,30], which needs to measure the solutions to the ODEs of the system, 
thus the scalability of these approaches is largely restricted. Another major app- 
roach is creating a certificate synthesis through solving the associated SMT prob- 
lems [6, 16,31], which also has limited scalability since the complexity of symbolic 
computation in the general purpose SMT solvers. In this paper, we will utilize 
the advantage of RL to train an elaborately designed hybrid controller, which 
makes the system easier to be verified with safety and goal-reaching requirements 
while maintaining controllability. 

Our proposed hybrid controller is in the form of a lower degree polynomial 
plus a relatively small size neural network, called a polynomial-DNN controller. 
The learning-based process of the polynomial-DNN controller synthesis is divided 
into the following four phases: (1) at first we train a well-performing DNN con- 
troller by RL with safety and goal-reaching requirements; (2) then we manage 
to fit the trained DNN roughly by a polynomial with a prescribed lower degree 
bound as one part of the hybrid structure; (3) we construct a small and special 
neural network (NN) with Square activation function on the hidden layer and 
tanh on the output layer as the supplement for the polynomial part, and subse- 
quently distill an initial polynomial-DNN controller from the original DNN con- 
troller; (4) finally, using RL from the distilled one to fine-tune a well-performing 
polynomial-DNN controller. 

Thanks to the hybrid form consisting of a polynomial and a small NN with 
the special structure, the obtained hybrid controller is easier to verify and main- 
tains its expressiveness and flexibility for two main reasons: (1) considering the 
verification efficiency, the original DNN is fitted by a lower degree polynomial 
through coarse approximation which can be easily obtained and significantly 
reduce the difficulty of formal verification; (2) the NN part compensates for 
the controller performance loss caused by the coarse polynomial approximation. 
Benefitting from its feature, the system with the polynomial-DNN controller can 
be equivalently transformed into a polynomial form via system recasting, which 
makes post-verification easily solvable. 
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The necessity of proposing a polynomial-DNN type controller can be 
explained as follows. Transforming DNN into polynomial form enables the appli- 
cation of efficient polynomial solving techniques for formal verification, but there 
is no guarantee that a polynomial of a specified degree bound can fit a DNN 
with high accuracy; meanwhile, the approximation and corresponding verifica- 
tion problem will become quite complicated as the degree of the polynomial 
increases, which also may result in the failure of the verification. Therefore, 
we resort to lower degree polynomial approximation simultaneously retrain a 
small NN as the compensation for loss of accuracy, since a rough approximating 
polynomial part cannot replace the whole DNN controller, and the verification 
may fail for the system controlled by the polynomial part. The hybrid controller 
balances the richness of expressiveness and the ease of formal verification very 
well. To check the effectiveness of the proposed approach, we have evaluated the 
hybrid controller synthesis on a set of commonly used benchmark examples. To 
summarize, the main contributions of this paper are as follows: 


— We propose a method to synthesize a hybrid polynomial-DNN controller sub- 
ject to reach-avoid constraints, via RL incorporated with lower degree polyno- 
mial fitting and distillation based retraining, which not only maintains good 
control performance but also makes post-verification solvable. 

— We delicately design a residual network as a compensation of the target con- 
troller. The particularity of the differential form of the residual network allows 
us to cast the differential equations of the control systems into an equivalent 
polynomial form which is conducive to formal verification. 

— We carry out a detailed experimental evaluation on a set of benchmarks 
to demonstrate the effectiveness of our approach, and the necessity of the 
controller in such a hybrid form through ablation studies. 


1.1 Related Works 


Several research works focus on the controller synthesis for the safety require- 
ment, in which a typical way is to use reinforcement learning or supervised learn- 
ing to build the overall learning framework for synthesizing security certificates 
(such as control barrier function, CBF) [1, 26-29]. 

For the goal-reaching requirement, most of existing works concentrate on 
building controllers to drive the system to reach a specified set within a time 
bound [8,11,13,30]. Some others focus on synthesizing the control policy to 
make the system asymptotically converge to a specified goal state set, which 
is called stability requirement. The certificate of Lyapunov functions generation 
is a practical routine in this aspect [3-5, 15,25]. 

In fact, learning a reach-avoid controller, namely, for both safety and goal- 
reaching requirements, is a much more complicated problem. An example was 
given in [10], where a correct-by-construction controller that consists of a refer- 
ence controller and a tracking controller has been successfully built to derive the 
actual trajectory according to the reference trajectory, and different reference 
controllers have been pre-designed for different scenarios. 
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Recently, a new learning-based approach is implemented in [17], where the 
safe and goal-reaching policy is constructed by jointly learning two additional 
certificate functions using supervised learning. Notice that there may exist the 
risk of synthesizing false certificates, as the certificate constraints are only satis- 
fied at the sampled points. Although one can perform posterior formal verifica- 
tion to overcome this weak-point, it would be difficult to do the verification with 
several DNNs in the system. By comparison, our synthesized hybrid polynomial- 
DNN controller has clear advantages on formal verification. 


2 Preliminaries 


Notations. Let R[x] denote the ring of polynomials with coefficients in R over 
variables x = [21,21,...,%n]? , and R[x]” denotes the n-dimensional polynomial 
vector. Let X[x] c R[x] be the set of SOS polynomials. The distance from x to 
a set S is defined by ||x||, = infseg |x — s||2. A continuous function a : [0, a) > 
[0, +00) for some a > 0 is said to belong to class K if it is strictly increasing 
and satisfies a(0) = 0. A continuous function 8 : (—b, c) — (—co, +00) for some 
b,c > 0 is said to belong to extended-class K if it is strictly increasing and 
satisfies G(0) = 0. A continuous function y : [0,c) x [0, o0) — [0,+00) for some 
c > 0 belongs to class KL, if for each fixed s, the mapping 7(r,s) belongs to 
class K with respect to r, and for each fixed r, the mapping y(r, s) is decreasing 
with respect to s, and (r,s) > 0 as s > ov. 

This section formulates the safety and goal-reaching controller synthesis prob- 
lem. A controlled continuous dynamical system is modeled by first-order ordinary 
differential equations 


x=f(x,u), with u = k(x), (1) 


where x € W C R” are the system states, u € U C R™ are the control inputs, 
and f € R[x]” is the vector field defined on the state space D C R”. 

Assume f satisfies the local Lipschitz condition, which ensures (1) has a 
unique solution x(t, x9) in D for every initial state x9 € D at t = 0. A dynamical 
system is equipped with a domain YW C D and an initial set O C W, represented as 
a triple C = (f, W, O). Given a prespecified unsafe region X,, C D, we say that the 
system C is safe if all trajectories starting from O can not evolve into the unsafe 
region Xu, which has been widely investigated in safety critical applications. 


Definition 1 (Safety). For a controlled constrained continuous dynamical sys- 
tem (CCDS) C = (£,Y,0) and a given unsafe region Xu, the system is safe if 
for all xp € O, there does not exist tı > 0 such that 

Vt € [0,t1).x(t,x0) €W and x(t1, X0) E€ Xu. 


At the same time, another important property has received much attention which 
is a generalization of stability and called goal-reaching. 
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Definition 2 (Goal-reaching). Given a controlled CCDS C = (£, Y, O) and a 
set of goal states Xg C D, the system C is goal-reaching with respect to the goal 
set Xg, if there exists a KL-function y such that for any Xo € O, 


IIx(t)\|x, < V(x(O)|lx,,t) for all t > 0. 


Definition 3 (Safe and Goal-reaching Controller Synthesis). Given a 
controlled CCDS C = (£, V, ©) with f defined by (1) with an unsafe set Xu, and 
a goal set X,, design a locally Lipschitz continuous feedback control law k such 
that the closed-loop system C with f = £(x,k(x)) is both safe and goal-reaching 
as per Definition 1 and 2. 


The concept of barrier certificates plays an important role in safety verifi- 
cation of continuous systems. The essential idea is to use the zero level set of 
a barrier certificate B(x) as a barrier to separate all the reachable states from 
the unsafe region. The following concept of barrier certificate, adapted from [24], 
can be used to guarantee the safety of a given controlled CCDS. 


Theorem 1. /24] Given a controlled CCDS C = (£, V, O), with £ defined by 

(1), a feedback control law u = k(x), and the unsafe region Xu. Suppose there 

exists a real-valued function B : W — R satisfying the following conditions: 

(i) B(x) >0 Vx EO, 

(ii) B(x) <0 VxeE X., 

(iii) B(x) =0> £L;B(x)>0 Vaey, 

where Ls B(x) denotes the Lie-derivative of B(x) along the vector field f(x), i.e., 

LiB(x) =o ge - fi(x), then B(x) is a barrier certificate for the closed-loop 

system C with the control law k(x), and the safety of system C is guaranteed. 
For the goal-reaching controller design, we use a more general Lyapunov-like 

function which is introduced by the following definition. 


Definition 4 (Lyapunov-like function). Given a continuous system C = 
(f,W,O), and the set of goal states X, C W, a continuous differentiable real- 
valued function V : W — R is said to be a Lyapunov-like function if 

(i) {x|V(x) < 0} # 0 and {x|V(x) < 0} C X,, 

(ii) LV (x) < —B(V(x)) Vx EY, 

where B is some extended class K function, and LV (x) = X; ye fix) 

As mentioned in [17], the above Lyapunov-like function is more general than the 
classic one used in [3,4,21,25]. The Lyapunov-like function does not necessarily 
require that £ V(x) has to be always negative-definite, that is, £;V(x) > 0 can 
happen on {x|V (x) < 0}, which will make the function less restrictive. 


Theorem 2. For a controlled CCDS C = (£, VY, O) with f defined by (1) and a 
set of goal states X} C Y, if V(x) is a Lyapunov-like function as in Definition 
4, then the system under u = k(x) is goal-reaching with respect to Xq. 
Combining Theorem 1 and Theorem 2, we obtain the following assertion stat- 
ing that the existence of barrier certificates and Lyapunov-like functions guaran- 
tees the control law is both safe and goal-reachable. Hereafter, we refer to both 
barrier and Lyapunov-like functions as certificate functions for simplification. 
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3 Hybrid Polynomial-DNN Controllers Training 


For the safe and goal-reaching controller synthesis problem, we design an easy- 
to-verify control policy with the aid of reinforcement learning (RL) based on 
barrier certificate and Lyapunov-like function generation. As we know, it is hard 
for a controller with a simple structure to guarantee the safe and goal-reachable 
behaviors for large-scale systems. Contrarily, controllers with complex structures 
can make the system have more flexible behaviors. Unfortunately, it requires 
much more computation efforts to tackle reach-avoid verification of the system 
with such a complex controller. To make it amenable, we propose a method 
to learn a controller with special structure, hybrid polynomial-DNN controller, 
which is easily verifiable, and can be customized to safety and goal-reaching 
requirement. Specifically, this hybrid controller consists of a polynomial and a 
small-size neural network with one single hidden layer. Notably, it is expected to 
exhibit similar behaviors to the original complex DNN controller, but is much 
easier to be verified thanks to its special structure, which will be elaborated in 
Sect. 4. 

To achieve this, we adopt a low-degree polynomial to roughly approximate 
the DNN. Then we fix the structure of a small-size neural network and append 
it to the low-degree polynomial to construct a hybrid form controller, which 
is retrained using RL. To accelerate the retraining process, we use distillation 
technology to distill an initialization of the NN part in the hybrid controller. In 
summary, the learning-based process of the hybrid controller synthesis is divided 
into the following three stages, as shown in Fig. 1. 
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Fig. 1. The diagram of training framework. 


— Train a deep neural network controller via RL. Based on reinforce- 
ment learning, we train a deep neural network (DNN) controller for the given 
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control system directly. Briefly, the RL procedure continuously uses the cur- 
rent controller to drive the system by interacting with the environment, and 
updates the relevant parameters of the controller by rewarding and penaliz- 
ing. Through sufficient simulation and training, we expect to obtain a DNN 
controller that enables the system behavior to avoid the unsafe set and reach 
the specified target set with high probability. 

— Fit the DNN controller by a polynomial and distill a residual net- 
work by measuring the fitting error. From the learned DNN controller in 
the previous process, we reconstruct a hybrid controller consisting of a poly- 
nomial and a small neural network with a single hidden layer. Specifically, 
we approximate the trained DNN controller with an appropriate polynomial 
by sampling based method. The approximate polynomial is used as the main 
component of the hybrid controller. We further evaluate the error between 
the original DNN and the polynomial approximation by distillation learning, 
which yields a small neural network as a refined module. 

— Generate and retrain a hybrid controller by fine-tuning a small 
neural network from the distilled network. We construct a special small 
NN with square and tanh activation functions on the hidden and output layers 
respectively, which helps to transform the hard verification problem into a 
tractable polynomial one. At last, we retrain the hybrid controller consisting 
of the polynomial part and the small NN template by fine-tuning the small 
network initialized by the result from the distillation learning. 


3.1 Training Well-Performing DNN Controllers Using RL 


As illustrated in Fig.1, the RL method is applied to train a well performed 
controller, so that the system is able to avoid obstacles and reach the goal region 
within the time bound. 

We construct the reward function through encoding the desired behaviours 
of the closed-loop system under the DNN controller, which assures unsafe region 
avoidance and goal region reachability. We hope that the RL helps to synthesize 
an ideal controller by the designed reward, and all the trajectories of the closed- 
loop system starting from the initial set O cannot evolve into the unsafe region 
Xu, and reach the desired region X, under the trained DNN controller. So the 
reward function design should concern two aspects, i.e., reward the behaviours 
far away from the unsafe region, and reward the behaviours approaching the goal 
region. In terms of the safety requirement, the reward function should penalize 
the behaviours approaching X,,. Thus, the reward function can be defined as a 
joint Gaussian distribution on the system state, whose expectation and variance 
are the center and radius of X,,, respectively, 

n  (xi(t)—21, \2 
reward, (x) = eB OG) 
1,27) E€ Xu C D is the center of X, and py is the radius of 
Xu. Similarly, the reward for the goal-reaching purpose could be defined as a 
joint Gaussian distribution, 


where x, = (a1 
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x; (t)—-at 
= Eei (ey 


reward,(x,) = e 


where xg = (Ti; ..., 2y) and pg are the center and the radius of X4, respectively. 


The entire reward function consists of the above two components, i.e. 
reward(x;) = A- reward,(x¢) + (1 — A) - reward, (xt), 


to achieve the task of safety and goal reachability, where 0 < A < 1 is the 
parameter to control the weights between reward,(x;) and reward, (x+). 

The remaining problem is to train the controller via RL. Here we use Deep 
Deterministic Policy Gradient (DDPG) [20] which is a popular RL approach 
suited for continuous control applications. The DDPG algorithm combines the 
value-based and policy-based methods, and is made up of two neural networks: 
the critic network and actor network. 

To train the desired controller, we first generate a set of initial states from 
O. For each sampled initial state x9, with the help of urz, one may yield the 
associated trajectory as a discrete time state sequence {X0, X1," Xt; tt Xm} 
which does not enter the unsafe area, and then collect the transition tuples 
(Xt, X¢41, Uz, reward(x;)) to form a replay buffer. Every few time steps, a batch 
size of data is sampled from the replay buffer to update the parameters of critic 
network and actor network, and then the new controller is used to simulate the 
trajectory to collect new data until the controller behaves well. 


3.2 Polynomial Approximation 


Following the RL training process in Sect. 3.1, one may probably adopt a complex 
DNN structure to obtain a well-performing DNN controller. For safety critical 
systems, the properties of such synthesized controllers, such as safety and goal- 
reaching, need to be formally guaranteed. However, it is a challenging problem to 
verify specified properties for the closed-loop system under the trained DNN-type 
controller due to its complexity. Consequently, a high-degree polynomial can be 
found by approximating the trained DNN with extremely high precision and may 
be expected as the controller candidate to be verified with polynomial constraint 
solving. However, it could be an unbearable high computation complexity for the 
corresponding verification problem with such high-degree polynomial controller, 
which will be explained in the experiment section. 

Based on the trained DNN controller ug; through RL, we construct an easily 
verifiable controller with a hybrid form, which could lead the system to be safe 
and goal-reachable. We firstly roughly approximate the urz by a low-degree 
polynomial, denoted by p(x), as a part. Afterwards, we retrain a small NN, 
denoted by k(x), with one hidden layer as the compensation for the approxi- 
mation error between urgy and p(x). The hybrid polynomial-DNN controller is 
built, i.e., p(x) +(x). The main task of this subsection focuses on how to obtain 
the approximate polynomial p(x) based on sampling points. 
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Concretely, a real coefficient vector c is used to parameterize a polynomial 
p(x, c) with a given degree d, i.e., p(x, c) = )/, cjbj (x), where b;(x) are monomi- 
als with total degree < d. Given the sampling points, we can obtain the coefficient 
vector c* by solving a least squares problems. Thus, the approximate polynomial 
p(x,c*) is the approximation of ugz(x) on ¥, denoted by p(x) for brevity. And 
the residual function r(x) denotes the error between the approximate polynomial 
p(x) and the DNN controller, i.e., r(x) = urt(x) — p(x). 

Having p(x), we cannot just regard it as the controller, because the error 
r(x) between urz(x) and p(x) can not be ignorable. To take this into account, 
we compensate for the error by fitting the residual function r(x), by means of 
retraining a hybrid controller p(x) +k(x|@’) to rectify the system behavior, where 
6’ is the parameter to learn the NN part. 


3.3 Training the Residual Controller 


In this part, we retrain to compensate for the difference in system behavior 
guided by the polynomial part p(x) versus the original DNN controller urr. 


The Structure of the Residual Network. We design a special neural net- 
work as the compensation to make the resulting verification problem tractable. 
As illustrated in Fig. 2, a typical DNN has a layered architecture and can be rep- 
resented as a composition of its L layers: k(x|0’) = lz olp_10---01,(x), where 
li(x) = o;)(W;x + bi) which is parameterized by a weight matrix W; and a bias 
vector b;, and all the parameters are denoted by 6’ for brevity. This work consid- 
ers g; to be square activation on the hidden layers and tanh activation function 
on the output layer L, as shown in Fig. 2. This special setting has two advan- 
tages: i) ability to converge in the training process with the help of normalized 
output in the range of [—1, 1]; ii) ability to transform the control system with 
NN controller of this type into a polynomial form by system recasting (c.f. 4.1 
for more details). Regarding ii), we introduce a new variable x,,,1 to represent 
the NN output, i.e., 41 := tanh(A(x)), where h(x) := l-10- + -0l (x) denotes 
the polynomial part in NN. The main observation that allows us to transform 
the system with this NN controller into an equivalent polynomial system is the 
fact that the special NN’s derivative can be expressed as 


ing = (1— a2 ,)h. (2) 


Actually, we construct such small NN with one single hidden layer because it 
is enough to construct a simple structure neural network further added to the 
controller as the compensation to control systems well. 


The Residual Controller Training. Then we retrain the hybrid controller 
p(x)+k(x|0’) making use of RL technique as described in the previous subsection. 
In order to improve training efficiency, the knowledge distillation technique is 
used to obtain the initialization of the NN part, i.e., k(x|6’). It is easy to achieve 
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Fig. 2. Structure of the small neural network in the hybrid controller. 


this by regarding the residual function r(x) as the ensemble network (also called 
teacher network) and distilling the knowledge from it into a small model (i.e., 
student network). The learned student network realizes the knowledge transfer 
from the teacher network and provides the initial values for the k(x|’) for further 
training. 

We reiterate that the purpose of constructing a hybrid controller by adding 
k(x|0') to the polynomial part p(x) is to make the hybrid controller drive the 
system to perform as expected by the compensation. Here we achieve this not 
by training k(x|6’) to satisfy urz = p(x) + k(x|6’), but instead we require the 
controller p(x) + k(x|@’) could drive the following closed system to be safe and 
goal-reachable essentially: x = f(x, p(x) + k(x|0’)). 

We need to train a hybrid controller p(x) + k(x|6’) for the above system to 
obtain the parameter 6’. Utilizing the learned parameters of the student network 
from the knowledge distillation as the initialization for the k(x|0’), we simulate 
the system to collect a dataset of sampled trajectories, and use the DDPG algo- 
rithm to achieve the control objective of safety and goal-reaching, by referring 
to the reward design elaborated in Sect. 3.1. Once the training is completed, we 
obtain the desired hybrid polynomial-DNN controller u(x) = p(x) + k(x), where 
p(x) is the polynomial part and k(x) is the small neural network. 


4 Reach-Avoid Verification with Lyapunov-Like Functions 
and Barrier Certificates Generation 


To ensure the safety and goal-reaching properties for the specified control system 
under the synthesized controller, a relaxed surrogate is to generate a Lyapunov- 
like function and a barrier certificate, stated in Theorem 1 and Theorem 2. 
Note that, to make the computation tractable, the basic idea is to translate 
the problem of producing barrier certificates and Lyapunov-like function into 
a solvable polynomial optimization problem. Specifically, we first transform the 
ODEs f of the CCDS through system recasting; and then we abstract the initial 
set O, unsafe region Xu, goal set Xg and the system domain WY by polyno- 
mial expressions. At last, we establish the polynomial optimization problems 
yielded from the constraints of barrier certificate and Lyapunov-like function, 


314 Z. Yang et al. 


proceeded by solving the resulted polynomial optimization problem to produce 
a barrier certificate and Lyapunov-like function, which can guarantee the safety 
and goal-reaching properties for the system with the hybrid controller, respec- 
tively. Notably, Sum-of-Squares (SOS) relaxation technique is applied to encode 
the polynomial optimization problem as an SOS problem involved with bilinear 
matrices inequalities (BMI) constraints. 


4.1 Constructing Polynomial Simulations of the Controller Network 


In the following, we assume the control input u is one-dimensional for ease of 
presentation without loss of generality. Given a controlled CCDS C = (f, Y, O) 
with f defined by (1) with an unsafe set X, and a goal set X}. Suppose the 
hybrid controller learned for the safety and goal-reaching requirements is u(x) = 
p(x) + k(x). Here k(x) is a small neural network with the square function as its 
activation function in the hidden layer, and the tanh in the output layer, i.e., 
k(x) = tanh(h(x)) where h is a polynomial which is in fact the composition of 
an affine function and a square function. We replace the non-polynomial term 
occurring in the controller part of the vector field f(x, u) by introducing 2,41 = 
tanh(h(x)). Then x = f(x, u) is transformed into a polynomial one: 


x = f(x, p(x) + Tn+1), 
E = (1 — z? 41 )h(x). (3) 


For simplicity, we denote (3) as f € R[x]"*!. 

Besides the vector field, we need to transform the O, W, Xu, Xg respec- 
tively because of the introduced new variable. For instance, the initial set should 
be specified by O := {(x,an41) E€ R"*1|x € O, angi = tanh(h(x))}. Actu- 
ally, O can be abstracted by a polynomial inclusion. For the initial set O, we 
first compute a hyper-rectangle J := {x € R"| Al; < a < ui} as an over- 
approximation of the bounded compact set © through interval analysis, then 
we could compute a Taylor model for the term tanh(h(x)) on J and obtain 
pi(x) — 61 < £n41 < pi(x)+6,. For O, we can get the corresponding polynomial 
abstraction Ô. For brevity, let x denote the variable vector with the introduced 
variable 2n41, ie., X = (X,2n41) = (£1, . --, En, &n4+1)T. Likewise, the other sets 
W, Xu, Xg can be dealt with in the same manner, and yield the associated 
polynomial abstractions, Ê, Riis Xo The above polynomial abstractions can be 
written as following 


k ER”+HI|xEO, |en — pi(x)| < ô}, 
R” |x EY, |an41 — p2(x)| < do}, 
R+ |x E Xis |En+1 — p3(x)| < d3}, 
R™*1 |x € Xg, |tn41 — pa(x)| < dy}. 


(4) 


Finally, we obtain a polynomial CCDS C = (Ê, Ê, Ô). Therefore, if x(t) is 
a trajectory of system (1) within domain specified by ¥ starting from some 
initial state x(to) € O, then x(t) is the trajectory of system (3) within the 
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relaxed domain specified by Ê starting from the initial state (to) € Ô with 
tn (to) = tanh(h(x(to))). 

Theorem 3. If controlled CCDS C = (f, Ô, Ô) with Ê defined by (3) and with 
Ô, W, and X,, defined by (4) is safe, then the original CCDS C = (£, Y, O) with 
the given unsafe set Xu is safe. Moreover, if B(X) is a barrier certificate of C 
w.r.t. Ru, then B(x, tanh(h(x))) is also the barrier function of C w.r.t. Xu. 


Proof. Without loss of generality, let us assume that x(t), t > 0 is one trajectory 
of the controlled CCDS C starting from the initial state x(to) € O, then x(t) 
with £n+1(t) = tanh(h(x(t))) is a trajectory of C starting from the initial state 
X(to) € Ô. Then, the safety of C indicates that each trajectory of Ĉ from the 
initial state Ô cannot reach any unsafe state specified by the assertions Ru, 
which implies that each trajectory of C from the initial state x(to) cannot reach 
any state specified by Xu. Furthermore, the vector field f is yielded from f by 
the equivalent transformation, and Ô, W and X,, are the associated polynomial 
abstractions. Therefore, B(x, tanh(h(x))) is the barrier certificate of CCDS C. 


Theorem 4. If controlled CCDS Ê = (f, Ê, Ô) with Ê defined by (3) and with Ô, 
W and x defined by (4) is goal-reaching, then the original CCDS C = (f,Y, O) 
with the given goal set X; is goal-reaching. Moreover, if V(x) is a Lyapunov-like 
function of C w.r.t. Xq, then V(x, tanh(h(x))) is the Lyapunov-like function of 
C w.r.t. Xg. 


Proof. Suppose the CCDS C is not goal-reaching for the given goal set X4. Then 
de and 3xo E O such that ||x(t)||x, > €,Vt > 0. The state R(t) € W with 
Zn+1(t) = tanh(h(x(t))) from the initial state (to) satisfying 


Ik@liz, > 6 (5) 


because according to (4), Ñ, is obtained just by involving a new variable and not 
changing the projection on the first n-dimension , i.e., Xg. Then from the theo- 
rem assumption, the CCDS C is goal-reaching, so IT > 0 such that ||X(t)||¢_ < €, 
which contradicts with (5). Similar to Theorem 3, V(x, tanh(h(x))) is the 
Lyapunov-like function of C w.r.t. Xg. This completes the proof. 


4.2 Producing Barrier Certificate and Lyapunov-Like Function 


For simplicity, hereafter we denote Ô, Ê, X,, and Š; as follows. 


Ô = {2 E RH | AM, g) 20}, È = {KER | AM, hy (XR) > 0}, 

Ky = {KER | AM, gR) EO}, Åy = KER] AM, selh) > 0}. 
Barrier Certificate Generation. Assume that the barrier function B(x) is 
a polynomial of degree at most d, whose coefficients form a vector space of 
dimension s(d) = ow) with the canonical basis ($) of monomials. Suppose 
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the coefficients are unknown, and denoted by b = (ba) € R* the coefficient 
vector of B(x), and write 


= SO 1,02 a An+1 
=) baf =>% ba TI a En Enl > 


acN7 acNg 


in the canonical basis. As stated in Theorem 1 and Theorem 3, the controlled 
CCDS C is safe under the designed controller if there exists such a barrier cer- 
tificate B(%,b) for CCDS Ê. Meanwhile, determining the existence of barrier 
certificate B(x,b), can be represented as the following feasibility problem. 


find b 

s.t. B(&,b) >0, Ve € Ô, (6) 
Le, B(X,b) > 0, Vx € Ê and B(%,b) = 0, 
B(&,b) <0, Vk € Ñu. 


Moreover, Sum-of-Squares (SOS) relaxation technique is applied to encode 
the optimization problem (6) as an SOS program. Given a basic semi-algebraic 
set K defined by: K = {& € R"*!| 9i(&) > 0,...,9s(%) > 0}, where gi(X) € 
R[x], 1 < i < s, a sufficient condition for the nonnegativity of the given polyno- 
mial f(x) on the semi-algebraic set K is provided as 


F(R) = 90(*) + > oi(X)gi(X), (7) 


where o;(X) € X[R]a, 1 < i < s. Thus, the representation (7) ensures that the 
polynomial f(X) is nonnegative on the given semi-algebraic set K. 

Observing (6), the polynomial Lf, B(X, b) is involved with the uncertain vari- 
able € in the range [—p*, u*], which can be written as h(e) > 0 with 


Re) = (e + w*)(u" — €). 


Thus, the problem (6) can be transformed into the following optimization prob- 
lem through SOS relaxation 


find b 

s.t. B(X,b) — D7; oi(&)gi(X) € VR], i (8) 
Lt, B(x, b) — A(x) B(x, b) — 97; Gj (&) hj (&) — u(x, e)h(e) — e € XR], 
—B(%,b) —€'— Dyk ry x) qi (x 3) € Df], 


where e,¢’ > 0, the entries of nk ), Oj (&) K(X) € XR], and v(x,e) € Xx, e], 
and A(x) € R[x]. Note that «,¢’ are needed to ensure positivity of polynomials 
as required in the second and third constraints in (6). The feasibility of the 
constraints in (8) is sufficient to imply the feasibility of the constraints in (6). 

Investigating (8), the product of undetermined coefficient parameters from 
A(X) and B(x,b) in the second constraint makes the problem into a bilinear 
matrix inequalities (BMI) problem, which can be carried out by calling a Matlab 
package PENBMI solver [18]. 
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Remark that the existence of the feasible solution b* to the problem (8) 

implies that the system is guaranteed to be safe under the designated controller 
u(x) = p(x) + k(x). 
Lyapunov-like Function Computation. We wonder that the learned con- 
troller is guaranteed to be not only safe but also goal-reaching in a sense of 
driving the system to converge to the specified goal set. As stated in Theorem 
2, the existence of Lyapunov-like function suffices to prove that the system’s 
behaviors asymptotically converge to the specified goal set Xg. In the similar 
manner, we first formalize the goal-reaching verification for system C through 
Theorem 2 and Theorem 4. Assume that the Lyapunov-like function V(x) is a 
polynomial of degree at most d’, whose coefficients form a vector space of dimen- 
sion s(d’) = Ca) with the canonical basis (x°) of monomials. We introduce 
the coefficient parameters of the Lyapunov-like function V(x) denoted as the 
vector v = (va) € R*), and write 


V(x,v) = >. Vax” = 5 aa E 
acna! aenn! 
in the canonical basis. By Theorem 4, the controlled CCDS C is goal-reaching 
under the designed controller can be reduced to that the CCDS Cis goal-reaching 
if there exists such a Lyapunov-like function V(x, v). The existence of Lyapunov- 
like function can be solved by tackling the following feasibility problem: 


find v 
st. 0A {&: V(R,v) <0} C 


Žo (9) 
LVR, v) < —B(V(X,v)), VR EW. 


Similarly, we encode the uncertain variable £ in the range [—u, u] into h(e) > 
0 with h(e) := (e + u)(u — £), and € is involved by the controller u in the 
polynomial Le, V (R, v). And for the given goal-reaching set ae the constraint 
{2 : V(%,v) < 0} 40 can be encoded by V(Xo,v) < 0 for a point Xo € X4. 

Depending on the above encoding operations, the problem (9) can be trans- 
formed into the following constrained polynomial optimization problem 


find v 

s.t. 5;(&) + o}(&)V (x, v) € YX], i 
—L£4,V(X,v) — B(V(%,v)) — 20; GA) -3 ehle) € X], 
—V(Xo,v) € [x], 


(10) 


where 1 < i < m4, 1 < j < mo, the entries of o/(X), $j (X) € X[k], and 
v'(x,¢) € X[R, £]. For the sake of simplicity, we consider the extended class K 
function G(-) is the G(x) = x or B(x) = r-a (r >0). 

In summary, the safety and goal-reaching verification problem is transformed 
into a BMI problem by combining (8,10) for the parameters b and v. The solution 
b* to problem (8) yields a barrier certificate B(x, b*). It means that the closed- 
loop system under the designed controller u(x) = p(x) + k(x) is safe. And the 
solution v* to (10) produces a Lyapunov-like function V(x, v*), which means 
that the system asymptotically converges to the specified goal set Xg. 
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5 Experiments 


In this section we first present a nonlinear system to illustrate our approach, and 
then report an experimental evaluation of our method over a set of benchmark 
examples and compare with other two different potential methods. All experi- 
ments are conducted on 3.2GHz AMD Ryzen 7 3700X CPU under Windows 10 
with 16GB RAM. 


Example 1. [Academic 8D Model |6|/ Consider the following continuous dynam- 
ical system in the plant: 


x z+ 8y 
y| =| —yte2 
Zz —z—- r? +u 


The system domain is Y = {x = (x,y,z)! € R| —5 < x,y,z < 5}. Our 
goal is to design a control law u = p(x) + k(x) such that all trajectories of the 
closed-loop system under u starting from the initial set 


O = {x € R? |(x +0.75)? + (y + 1)? + (z + 0.4)? < 0.357} 
will never enter the unsafe region 
Xu = {x € R? |(x + 0.3)? + (y + 0.36)? + (z — 0.2)? < 0.30°}, 


and eventually enter the goal set X; = {x € R? |x? + y? +2? < 0.1°}. 

For the controller learning process, we attempt to train different NN struc- 
tures with increasing depth and width as the controller templates, until a desired 
controller is obtained. We eventually obtained one DNN controller with 5 hidden 
layers each consisting of 128 neurons, but failed for smaller sizes. Based on this 
learned DNN controller, we construct a hybrid controller for the system. The 
polynomial part p(x) is carried out by the sampling-based method as follows: 


p(x) = 0.125 —3.333x — 5.726y — 10.669z + 1.9112? + 1.212xy 
+2.13822z — 1.332y2 — 10.07yz — 12.95222. 


The hybrid controller is then constructed as p(x) + k(x|6’) where k(x|6’) is 
a small NN with one hidden layer. After retraining by taking p(x) + k(x|0’) into 
the system, we obtain the NN part with one hidden layer containing 30 neurons. 
Under the hybrid controller p(x) + k(x), the controlled system can be verified 
to satisfy the safety and goal-reaching properties by the following barrier certifi- 
cate B(x, tanh(h(x))) and Lyapunov-like function V (x, tanh(h(x))) respectively, 


B = 0.64122 — 0.143ry + 0.554y? + --- + 0.004 tanh(h(x)) — 0.3532 + 0.061, 
V = 0.092? — 0.311ry +--+ + 0.0123 tanh(h(x)) — 0.033a — 0.0242 — 0.01, 


where h(x) = 2.2482? + 0.962xy + --- — 0.3892 + 9.051. 
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Fig. 3. Phase portrait of the system in Example 1. Subfigure (a) describes the zero level 
set of the barrier certificate B(x) (the blue surface) separates unsafe region Xu (the 
red ball) from the initial set O(the yellow ball). Subfigure (b) describes all trajectories 
of different colors from © (the yellow ball) can reach X, (the green ball). (Color figure 
online) 


Figure 3(a) shows the zero level set of the barrier certificate in blue color 
which separates X, (the red ball) from all trajectories starting from O (the 
yellow ball), and Fig. 3(b) describes the simulation of different trajectories of the 
system converges to the goal set X, (the green ball) under the learned hybrid 
controller. Therefore, we conclude that the system can be guaranteed to be safe 
and goal-reachable from the initial set under our learned hybrid controller. 


Although DNN policy by RL may appear to work well in many applications, 
it is difficult to assert any strong and provable claims about its correctness since 
the neurons, layers, weights and biases are far-removed from the intent of the 
actual controller. As found in [32], the state-of-the-art neural network verifiers 
are ineffective for verification of a neural controller over an infinite time horizon 
with complex system dynamics. So the idea is to learn a controller with formal 
reasonings of the specified property. The following part is to conduct the research 
experiments stated below: 

RE1: Explore directly learning a polynomial controller to control the system 
and guarantee its safety and goal-reaching requirements. 

On the verification point, one may think how about directly learning a poly- 
nomial controller to control the system (without appealing to the neural policy 
at all), using reinforcement learning to synthesize its unknown parameters. So 
the experiment first tried training the controller network with the commonly 
used Square activation function. Through training on the data set from 250 tra- 
jectories with 3000 data points on each, the result was unsuccessful for different 
network structures (of up to 5 layers and 250 neurons), which means it still fails 
when simulating the behaviors of the system under the trained polynomial con- 
troller. As mentioned in [32], Zhu et al. found that despite many experiments 
on tuning learning rates and rewards, directly training a linear control program 
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to conform to their specification with either reinforcement learning (e.g. policy 
gradient) or random search was unsuccessful because of undesirable overfitting 
even for an example as simple as the inverted pendulum. 

RE2: Explore the effects of using just a polynomial or a small NN to imitate 
the original DNN to avoid the hybrid form. 

Our method is based on the RL to obtain a well-performing DNN controller in 
general form, and then with the guidance of the learned DNN, a hybrid controller 
is designed which is verifiable for the safety and the goal-reaching properties. The 
experiment next shows the performance of the hybrid controller synthesis and 
the comparison of the verification performance with other two RL-guided con- 
troller synthesis methods: 

(RE2-1) Obtain a polynomial controller by imitating and abstracting the 
trained DNN controller, and under the guidance of the abstracted polyno- 
mial controller the resulting verification of the control system can naturally be 
encoded to a polynomial constraint solving problem; 

(RE2-2) Abstract the DNN controller based on knowledge distillation to obtain 
a small network that is in simple structure, which is expected to maintain the 
safety and goal-reaching of the original network (on data set) [11]. Since the 
posterior verification cannot avoid approximating the neural network with a 
polynomial, and the upper bound of the error is positively related to the Lip- 
schitz constant, the distilled small network is hopeful to make the verification 
successful thanks to its smaller Lipschitz constant. 


Table 1. Performance Evaluation 


Ex Nx | dg NN Struc. Hyb. design | Poly.(RE2-1) | Distil.(RE2-2) 
uo(x) k(x) | dp,dv |Ty(s) dp |dgB,dv | Tp(s) | dp,dy | Tp(s) 

Cı [17] 2 | 3 | 2-128(4)-1 | 2-20-1 2,2 4.953 15 | 2,2 21.18 | 2,2 3.507 

C2 [31] |2 3 |2-64(4)-1 2-20-1 2,2 [4877/4 [2,2 [27.61/24 | 8.492 

C3 [32] 12 [3 |2-64(5)-1 [2-20-1]2,2 [3.813 x |2,x |x 2,4 |10.56 

C4 [3] 2 | 4 | 2-64(4)-1 | 2-20-1 2,2 8.763 5 | 4,4 82.92 | 4,2 10.16 

Cs [6] |3 |2 |3-128(5)-1/3-30-1)2,2 [11.70 |x |4,x |x 4x |x 

Ce [6] |3 |4 |3-128(5)-1/3-30-1/2,2 [19.42 |5 [4,4 |1034]x,x |x 

Cr |2] l4 1 |4-128(5)-2/4-50-2/4,4 [49.26 x |x,x |x 2x |x 

Cs [17] 4 |4 |4-128(5)-1 | 4-40-1| 2,2 [28.47 6 |4,4 [2291 ]x,x |x 

Cy [17] 6 |3 | 6-128(5)-2 1-50-2 | 2,4 64.05 |x |x,x |x x,x |x 

Cio [19] 7 | 2 | 7-128(6)-1 7-50-1 2,2 69.73 |x |x,x |x xX |x 


We present a detailed experimental evaluation on a set of benchmarks in 
Table 1. The origins of these 10 widely used examples are provided in the first 
column; nx and dẹ denote the number of state variables and the maximal degree 
of the polynomials (or the polynomial abstraction by Taylor model for non- 
polynomial systems) in the vector fields. The examples are with dimension up 
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to 7. uo(x) denotes the network structure of the DNN controller synthesized by 
RL directly. For example, the trained DNN controller for C4 has 4 hidden layers 
with 128 neurons on each. Here, all DNNs are with ReLU activation functions 
except for tanh on the output layer. 

Table1 has shown the performance of the mentioned three controller syn- 
thesis methods with the guidance of the well-trained DNN uo(x), i.e., hybrid 
controller design, polynomial controller by imitating (denoted as Poly.), NN con- 
troller by distillation (denoted as Distil.). All the verification process on these 
methods is carried out through the certificate function generating and the time 
costs are recorded as Ty, Tp and Tp respectively, when both barrier certificate 
and Lyapunov-like function have been obtained, and the degrees of the obtained 
certificate functions are recorded as dg,dy; otherwise, ‘x’ is marked when failing 
to compute any barrier certificate or Lyapunov-like function within the degree 
bound of 6 and the time bound of 3 hours. 

In our hybrid controller design method (i.e., Hyb. design), we uniformly 
choose p(x) of degree 2 and k(x) with one single hidden layer shown in col- 
umn k(x). dg and dy denote the degrees of the computed certificates of barrier 
function B(x) and Lyapunov-like function V(x) respectively. Ty in the last col- 
umn denotes the verification time cost. 

The column Poly. exhibits the results of the method described in (RE2-1) on 
the benchmarks, intending to further explain the necessity of proposing a hybrid 
form controller. As an ablation study, we only use polynomial approximations of 
the original DNNs as surrogate controllers and carry out certificate-based veri- 
fication of them. Considering the control effect, we increase the degree bound of 
polynomial templates to 8 to ensure a high precision approximation. dp denotes 
the lowest degree of the polynomial surrogate controllers that pass verification 
and Tp denotes the corresponding time cost; ‘x’ means that no such controller 
is found. The column Distil. provides the results of the method in (RE2-2) on 
the benchmarks. In this ablation study, we have distilled simpler NNs with one 
single hidden layer from the original DNNs and verify the specified properties 
using the distilled NN controllers. This process is repeated with the number of 
neurons of distilled NNs ranging from 20 up to 50 on its hidden layer, until 
obtaining one satisfying the specified properties whose verification time cost is 
denoted in Tp, or failing to obtain one such simpler NN, denoted by ‘x’ in Tp. 

For all the 10 examples, we have successfully verified the safety and goal- 
reaching properties of the synthesized hybrid controllers with the certificate 
generation, while the methods based on polynomial surrogate controllers (i.e., 
Poly.) and distilled NN controllers (i.e., Distil.) succeed on 5 and 4 benchmarks, 
respectively. Moreover, for some examples, Hyb. design method can find barrier 
certificates and Lyapunov-like functions with lower degrees. Consequently, the 
decision variables of the BMI problems are less than the other methods, which 
does contribute to improving the effectiveness of the verification procedure. 

We compare the efficiency of the methods in terms of the time spent in the 
verification process for successful examples. On average, the time spent by Tp 
is 4.3 to 9.5 times as that of Ty on the 5 successful cases of Tp. Meanwhile, the 
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time cost by Tp is about 8.18 seconds on average, which is 1.46 times more than 
that of Ty on the four successful cases of Tp. Comparing Ty with Tp and Tp, 
we conclude that verification of the hybrid controllers is much more efficient. 
To summarize, Table 1 shows that all the synthesized hybrid controllers have 
been efficiently verified to make the systems safe and goal-reachable on a set 
of commonly used benchmark examples, which demonstrates that our hybrid 
polynomial-DNN controller synthesis method is quite promising. 


6 Conclusion 


This paper has presented an approach to synthesize hybrid polynomial-DNN 
controllers for nonlinear systems such that the closed-loop system can be both 
well-performing and easily verified upon required properties. Our approach has 
creatively integrated low degree polynomial fitting and knowledge distillation 
into RL method during the constructing process. Thanks to the special fea- 
ture of the hybrid controller, the controlled system can be transformed into the 
polynomial form. The SOS relaxation based method is applied to generate bar- 
rier certificates and Lyapunov-like functions, which can verify the safety and 
goal-reaching properties of the nonlinear control systems equipped with our syn- 
thesized hybrid controllers. Extensive experiments consistently demonstrate the 
effectiveness and scalability of the proposed approach. 
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Abstract. A safety verification task involves verifying a system against a 
desired safety property under certain assumptions about the environment. 
However, these environmental assumptions may occasionally be violated 
due to modeling errors or faults. Ideally, the system guarantees its critical 
properties even under some of these violations, i.e., the system is robust 
against environmental deviations. This paper proposes a notion of robust- 
ness as an explicit, first-class property of a transition system that captures 
how robust it is against possible deviations in the environment. We mod- 
eled deviations as a set of transitions that may be added to the original 
environment. Our robustness notion then describes the safety envelope of 
this system, i.e., it captures all sets of extra environment transitions for 
which the system still guarantees a desired property. We show that being 
able to explicitly reason about robustness enables new types of system 
analysis and design tasks beyond the common verification problem stated 
above. We demonstrate the application of our framework on case studies 
involving a radiation therapy interface, an electronic voting machine, a 
fare collection protocol, and a medical pump device. 


Keywords: Robustness - Discrete Transition Systems - Model 
Uncertainty 


1 Introduction 


A common type of verification task involves verifying a system (C) against a 
desired property (P) under certain assumptions about the environment (E); i.e., 


C\|E 


= P. Such assumptions may capture, for example, the expected behavior of 


a human operator in a safety-critical system, the reliability of the communication 
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channel in a distributed system, or the capabilities of an attacker. However, the 
actual environment (£’) may occasionally deviate from the original model (F), 
due to changes or faults in the environment entities (e.g., errors committed by 
the operator or message loss in the channel). For certain types of deviations, a 
system that is robust would ideally be able to guarantee the property even under 
the deviated environment; i.e., C||E’ — P. 

This paper proposes the notion of robustness as an explicit, first-class prop- 
erty of a transition system that captures how robust it is against possible devi- 
ations in the environment. A deviation is modeled as a set of extra transitions 
that may be added to the original environment, resulting in a new, deviated 
environment F’ that has a larger set of behaviors than E does. Then, system 
C is said to be robust to this deviated environment with respect to P if and 
only if it can still guarantee P even in presence of the deviation. Finally, the 
overall robustness of C with respect to E and P, denoted A, is the largest set 
of deviations that the system is robust against. 

Conceptually, A defines the safe operating envelopes of the system: As long 
as the deployment environment remains within these envelopes, the system can 
guarantee a desired property. Being able to explicitly reason about A enables 
new types of system analysis and design tasks beyond the common verification 
problem stated above. Given a pair of alternative system designs, C1 and C2, one 
could rigorously compare them with respect to their robustness levels; they both 
may satisfy property P under the normal operating environment FE, but one may 
be more robust to deviations than the other. Given two properties, P, and P> 
(the latter possibly more critical than the former), one could check whether the 
system would continue to guarantee P under a deviated environment even if it 
fails to do so for P,. Finally, given E, P, and a desired level of robustness, A, 
one could synthesize machine C to be robust to A. 

In this paper, we formalize (1) the proposed notion of robustness and (2) the 
problem of computing A for given C, E, and P. One approach to automatically 
compute A is a brute-force method that enumerates all possible sets of devi- 
ations; however, as we will show, this approach is impractical, as the number 
of deviations is exponential in the size of the environment. To mitigate this, 
we present an approach for computing A by reduction to a controller synthesis 
problem [35,37]. 

We have built a prototype of the proposed approach for computing robust- 
ness and applied it to several case studies, including models of (1) a radiation 
therapy interface, (2) an electronic voting machine, (3) a public transportation 
fare collection protocol, and (4) a medical pump device. Our results show that 
our approach is capable of computing A to provide information about deviations 
under which these systems are able to guarantee their critical safety properties. 

The contributions of this paper are as follows: (i) A novel, formal definition 
of robustness against environmental deviations (Sect. 4); (ii) A simple, brute- 
force method for computing robustness and a more efficient approach based on 
controller synthesis (Sect.5); and (iii) A prototype tool for computing A and an 
experimental evaluation on several case studies (Sect. 6). 
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2 Motivating Example 


As a motivating example, we consider the Therac-25 radiation therapy machine. 
This machine is infamous for a design flaw that caused radiation overdoses, 
several of which led to the deaths of patients who received treatment [18]. In 
this section, we introduce a model for the Therac-25 based on the descriptions in 
[18] and discuss several methods for analyzing its safety. We show that robustness 
provides a generally richer analysis than classic verification. 


up, enter 


Flattener 


—| beam ready 


I 
NS 


x,e enter beamReady 


rotate 


b Spreader 


(a) The operating terminal, Crerm. (b) The turntable, Crurn- 


a EA pener, fa À > de d 
C Eei peang i J 
(c) The normative environment, Æ 


Fig. 1. The Therac-25 is modeled as Cr25 = Cterm||Coeam||Cturn. Coeam is in Fig. 7b. 


System. We model the Therac-25 as the composition of the following three 
finite-state machines: (1) Crem, a computer terminal that nurses use to oper- 
ate the Therac-25, (2) Cheam, a beam-emitter that fires a radiation treatment 
beam in either X-ray or electron mode, and (3) Crurn, a turntable that rotates 
between two hardware components called the flattener and the spreader. For- 
mally, we define the Therac-25 as the composition all three machines: Cr25 = 
Crerm||Cbeam||Crurn. We show the terminal and turntable in Figs. la and 1b 
respectively. We show the beam in Sect. 6.2 (Fig. 7b), where we present a case 
study on the Therac-25. 


Environment. Nurses operate the Therac-25 by typing at a keyboard con- 
nected to a terminal. A nurse begins by choosing a beam mode by typing either 
an “x” for X-ray or an “e” for electron mode. The nurse then hits the “enter” key 
and waits for the terminal to display “beam ready” before finally pressing the 
“b” key to fire the beam. This workflow defines the operating environment which 
we call E, shown in Fig. 1c. 


Safety property. Since the X-ray beams contain a high concentration of radi- 
ation, it is imperative that the flattener is in place when the machine fires an 
X-ray. We capture this key safety property in the following LTL [36] formula: 


G(XFIRED > FLATMODE) 
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In this formula, XFIRED is a predicate that is true if an X-ray beam was just 
fired, while FLATMODE is a predicate that is true when the turn table is in 
flattener mode. We refer to this safety property as Pyflat in this example. 


Safety Analyses. Robustness opens our safety analysis beyond classic verifi- 
cation. We discuss several analysis options below. 


(1) Standard Verification: We can check that the Therac-25 is safe within 
the operating environment, that is, E||Cr25 H} Pzfiat- Standard model checking 
techniques [2] show that the Therac-25 is indeed safe with respect to F. 


(2) Robustness Calculation: Given that the Therac-25 is safe with respect 
to E, we can calculate its robustness A. This calculation identifies the set of safe 
environmental envelopes of the Therac-25. Importantly, these envelopes reveal 
the environmental deviations that the Therac-25 can safely handle. For example, 
in Sect. 6.2, we show that the Therac-25 is robust against the environmental 
deviations in Fig. 8 in which a nurse repeatedly hits “enter” or the “up” arrow 
key after choosing a beam mode. 


(3) Controller Comparison: Holding the environment E and the property 
Pr flat constant, we can compare the robustness of the Therac-25 against other 
models. In Sect. 6.2, we introduce the Therac-20 (Cp29) and compare the robust- 
ness between Cros and Cro. Although both machines are safe with respect to 
the normative environment, we will find that Crəs is strictly less robust than 
C29. We will show how contrasting the robustness between the two machines 
exposes a critical software bug in the Therac-25. Furthermore, we will show that 
fixing the bug in the Therac-25 causes its robustness to be equivalent to the 
Therac-20. 


(4) Property Comparison: Holding the environment E and the machine C25 
constant, we can compare the machine’s robustness with respect to P, fig¢ and 
a second safety property. For example, we could consider a new safety property 
P’ that strengthens Pz fiat by additionally enforcing the spreader to be in place 
when a beam is fired in electron mode. The property P’ might be of interest 
to avoid an underdose, a situation that might result from the flattener being 
in place when an electron beam is fired. Because P’ is stronger than Py fiat, a 
designer may be interested to compare the robustness between the properties to 
understand which environmental deviations maintain P, fiat, but violate P’. 


3 Modeling Formalism 


This section describes the underlying formalism used to model the environment, 
controlled systems, and the properties enforced by them. 


Labeled Transition Systems. Given a finite set A, the usual notations |A| 
and A* denote the cardinality of A and the set of all finite sequences over A 
respectively. In this work, we use finite labeled transition systems to model the 
behavior of the environment, the controller, and the property. 


330 R. Meira-Goes et al. 


Definition 1. A labeled transition system (LTS) E is a tuple (Qg, Actg, Rr, 
qo,E), where Qr is a finite set of states, Actg is a finite set of actions, Rg C 
Qe X Acte x Qpr is the transition relation of E, and qox E€ Qeg is the initial 
state. 


LTS F is said to be deterministic if for any (q,a,q'),(q,a,q") € Rp, then qd’ = q"; 
otherwise it is nondeterministic. We extend the transition relation Rg to finite 
sequences of actions as Rg* C Qg x Actg“ x Qp in the usual manner. A trace 
of E is a finite sequence of actions ag ...an of E complying with the transition 
in Rg", i.e., (d0,2,40---4n,q) € Rpg* for some q E€ Qg. The set of all traces in 
E is denoted by beh(E). 

Given LTSs Æ; and Eo, the parallel composition || defines standard synchro- 
nization of E; and E> [2,7]. The composed LTS E1||E2 = (Qn, x Qp, Actz, U 
Act p,, Rp, ||E2; (q0, E1; q0,E2)) synchronizes over the common actions between F 
and Ev and interleaves the remaining actions. Lastly, given LTSs E; and E2, we 
say that E; is a subset of Ey, denoted EF, C Fo, if Qn, C Qp,, Acte, = Actr,, 
Rp, C Re,, and 90,8, = q0,£3- 


Control Strategy. Let an LTS E represent the environmental model to be 
controlled. A control strategy, or simply controller, for E is a function that 
maps a finite sequence of actions to a set of actions, i.e., C : Actp* > 24¢2. 
A controlled trace of E is a trace of E, ao...an E€ beh(E), such that a; € 
C(ao...a;—1) for any i < n. The set of all controlled runs, denoted by beh(E/C), 
defines the closed-loop system of C controlling Æ. For convenience, this closed- 
loop system is denoted by E£/C. In this work, we assume that controller C 
has finite memory and it can be represented by a deterministic LTS. With an 
abuse of notation, the LTS controller representation is also denoted by C. For 
convenience, we define controller C = (Qc, Actc, Rc,q,c) to have the same 
actions as in EF, i.e., Acto = Actg. In this manner, the closed-loop system 
E/C can be represented by the composition of environment E and controller C: 
E/C = E||C. 


Remark 1. We assume that all elements of the set of actions Act; are “control- 
lable” actions, that can be acted upon by a controller. However, the nondeter- 
ministic transition relation of E can be used to model uncontrollable actions of 
the environment. After an action a is selected by the controller at state q, the 
environment decides which state the system will be in, similarly to two-player 
games [15]. 


Safety Property. In this work, we consider a class of regular linear-time prop- 
erties called safety properties over an environment E [2]. A safety property P is 
represented by a deterministic LTS P that defines the set of accepted behaviors. 
Usually, the LTS P encodes both the traces that satisfy P and those that violate 
it by including a sink error state. Formally, any trace that reaches the error state 
err E€ Qp violates the safety property. An LTS E satisfies property P, denoted 
by E — P, whenever the traces in beh(£) do not reach the error state in P. In 
this manner, we can test if E = P by composing E]||P and investigating if the 
err is reached. 
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(a) Environment E (b) Controller C c) Property P 


Fig. 2. LTSs for the running example 


Example 1. We describe a simple example that we use as a running example 
throughout the paper. Figure 2 depicts the environment EF, controller C, and 
property P considered in this example. The environment E defines that action a 
is immediately followed by action b. Although controller C in Fig. 2b only shows 
action a, we assume that Actc = {a,b}. In this manner, C only allows action 
a to occur. Lastly, property P defines that action a should happen at most two 
times while action b should never happen. It follows that E/C = P since the 
controller disables action b and the environment only executes one instance of 
action a. 


4 Robustness Against Environmental Deviations 


4.1 Deviations 


A deviation is a set of transitions d C (Qg x Acte xX Qr) A deviated system is 
defined by augmenting the transitions of environment E with a deviation set: 


Definition 2. Given an LTS E = (Qg, Acte, Re, qG0,z) and a deviation d C 
Qe x Acte x Qg. We define the deviated system Ey as Ea := (Qg, Actg, Re U 
d, q0,E)- 


A controller C that guarantees property P for environment E, i.e., E/C — P, 
might violate this property for the deviated environment Eq, i.e., Ea/C j P. 


Definition 3. Controller C is a robust controller with respect to environment E, 
deviation d, and property P if Ea/C = P. Deviation d is a robust deviation with 
respect to E, C, and P if C is a robust controller with respect to E, d, and P. 


Remark 2. In this paper, we are only interested in ensuring safety properties 
over the controlled system. For this reason, it is sufficient to only consider adding 
new transitions to the environment. If a controlled system is safe, then deleting 
transitions from the environment does not violate the safety property. 


4.2 Comparing Deviations 


Each deviation set affects the environment in different ways. To reason about 
the effects of each deviation set, we compare them using a partial order relation 
over Qg Xx Act X Qp. For deviations dı and dz such that dı C d2, d2 deviates 
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LTS E more than d since beh(Eg,) C beh(Ea,). For this reason, we select the 
relation C over Qg x Actg x Qe to be the partial order to compare different 
deviation sets. 


Definition 4. Given E and deviations d,, dz, dı is at least as powerful as dz if 
dz C dy. 


4.3 Robustness 


Intuitively, robustness is defined as the set of all possible robust deviations d with 
respect to the environment F, controller C, and safety property Psa. Addition- 
ally, we introduce an environmental constraint, P.,,, to capture domain knowl- 
edge about the system under analysis. Pen, will filter environment deviations 
that might not be physically feasible or of interest to analyze. This constraint is 
captured as a safety property over F, i.e., E = Peny states that the environment 
satisfies the constraint. Formally, our robustness notions is defined as follows: 


Definition 5. Let environment E, controller C, property Psat such that E/C = 
Psat, and environment constraint Pen, such that E = Peny be given. The 
robustness of controller C with respect to E, Psat, and Peny, denoted by 
A(E,C, Psa, Penv), is a set of robust deviations A C 2QEXActEXQE, A is defined 
to be the (unique) set of robust deviations satisfying the following conditions: 


1. Vd € A. Ea/C } Peas |d is robust]; 

2. Vd C Qp x Act x Qr.Eg/C F Fsaf A Eq = Peny > dd’ € Ad Cd la is 
represented]; 

3. Vd, d' € A.d#4d' >d€d [unique representation]. 

4. Vd € A. Ea H| Pem |d is feasible]. 


When E,C, Psaf, and Peny are clear from context, we simply write A. The set 
A is also denoted as the safety envelope of C with respect to E, Psat, and Peny- 


Intuitively, the set A defines an upper bound on the possible deviations from Æ 
that controller C is robust against. In other words, A captures the envelopes for 
which controller C remains safe. 

If a designer does not have domain knowledge about the system, then Peny 
can be set to not constrain the environment, i.e., Peny = Act. After computing 
A without environmental constraints, a designer can obtain important informa- 
tion about the system and the environment. In the next analysis iteration, this 
knowledge can be transformed into environmental constraints to enhance the 
robustness analysis, i.e., Peny C Acth. 

By definition, A is always non-empty since d = 0) is always robust. Moreover, 
due to conditions 2 and 3, only maximal robust deviations are included in A. 
We show that there is a unique set of deviations that satisfies the conditions of 
Def. 5. The proof of this lemma is available at [27], pg. 23. 


Lemma 1. Given LTS E, controller C, safety property Piaf, and environment 
property Peny, there is a unique A that satisfies the conditions in Def. 5. 
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Example 2. Back to our running example, we investigate robust deviations 
and A. For simplicity, we do not impose any environment constraint, i.e., 
Peny = Act. Figure3 shows four robust deviations for our running example, 
where transitions in green are deviations added to the environment. All robust 
deviations allow at most two transitions with action a, which is the maximum 
number allowed by the property. In this example, A has three robust deviations 
that are represented in Figs. 3b-3d. Since the robust deviation shown in Fig. 3a 
is a subset of both deviations in Fig. 3b and Fig. 3c, it is not included in A. 


d a 
Va b a b 
a `a RRS 
(a) A robust deviated environment (b) Maximal robust deviated environment 
a 
a Y a b 
ewo 
a a 


(c) Maximal robust deviated environment (d) Maximal robust deviated environment 


Fig. 3. Robust deviated environments. Robust transitions Qz x {b} x Qz are omitted. 


4.4 Problem Statement 


Although Def. 5 has formally introduced our notion of robustness, it does not 
show how to compute robustness. Therefore, we investigate the problem of com- 
puting the set A. 


Problem 1. Given E, C, Piaf, and Peny as in Def. 5, compute A. 


4.5 Comparing Robustness 


Our robustness definition also allows us to compare the robustness between 
different controllers as well as different safety properties. 


Comparing Controllers. Holding the environment and safety property con- 
stant, we can compare the robustness of the controllers. 


Definition 6. Given an environment E, controllers Cı and Ca, safety prop- 
erty Psaf, and environment constraint Peny, controller C1 is at least as robust 
as Cy if and only if for all dg € A(E,C2, Psat, Penv) there exists di € 
A(E, C1, Psaf, Penv) such that d2 C dı. Equality and strictly less/more robust 
are defined in the usual manner using C. 
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Comparing Safety Properties. Holding the environment and controller con- 
stant, we can compare the robustness between safety properties. 


Definition 7. Given an environment E, controllers C, safety properties Psat. 
and Psaf.2, and environment constraint Penu, controller C is at least as robust 
with respect to Psaf than with respect to Psat. if and only if for all d2 € 
A(E,C, Psaf,2, Penv), there exists dı € A(E,C, Psaf,1, Penv) such that dz C dy. 


5 Computing Robustness 


This section presents two manners of solving Problem 1. One is a brute-force 
algorithm whereas the second uses control techniques to obtain the solution. 
Usually when dealing with regular safety properties, one transforms the safety 
property into an invariance property. This transformation is simply obtained 
by composing the environment with the safety property; then, an invariance 
property equivalent to the safety is defined over this composed system [2]. In 
this composed system, an invariance property is simply defined by a set of 
safe states. Unfortunately, computing robustness for safety properties does not 
directly reduce to computing robustness for invariance properties. 

When transforming a safety property Psa to an invariance property, we 
compose the environment and the safety property. Let us assume that there are 
no environmental constraints. In our scenario, the invariance property Pinu is 
defined based on the composed system F||C||Psaf, i-e., Pinu E Qx\\c||P..7- The 
composed system Piny introduces memory to the environment to differentiate 
when the safety property is violated or not. This memory addition prevents a 
simple reduction between invariance and safety properties since robustness is 
defined with respect to the environment. Robustness defines new transitions in 
E whereas computing robustness with respect to Piny defines new transitions in 
E||C||Psar- For this reason, we cannot simply reduce the problem of computing 
A with respect to safety properties to the problem of computing A with respect 
to an invariance property. 


5.1 Brute-Force Algorithm 


One way of solving Problem 1 is via a brute-force algorithm. Intuitively, this 
algorithm is broken into two parts: (i) finding the set of robust deviations that 
satisfy the environmental constraint, and (ii) identifying the maximal ones within 
this set. In part (i), we verify Eaļ||C H Psaf and Ea =| Peny for all deviations 
dC (Qer x Acte X Qe) \ Re, which can be solved using standard model checking 
techniques [2]. Since this algorithm checks if every deviation set is robust or not, 
it is clear that it computes A. 


5.2 Controlling the Deviations Without Environmental Constraints 


Due to the lack of scalability of the brute-force algorithm, we search for more 
efficient ways to compute A. For readability purposes, we start by describing our 
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algorithm in detail assuming no environmental constraints, i.e., unconstrained 
environment Peny = Acth. In the next section, we show how to use this algo- 
rithm to completely solve Problem 1, i.e., for a possibly constrained environment 
F, env © Acth. 


Overview of the Control Algorithm. At a high level, we transform the 
problem of computing A to a problem of controlling environmental transitions 
to avoid safety violations. Intuitively, we control deviations to force them to be 
robust, i.e., we take the viewpoint that we can control transitions in (Qg x Act g x 
Qe) \ Re. Different ways of controlling transitions in (Qg x Actp x Qg) \ Re 
provide different robust deviations. 


Compute Compute Control 
Deviated Meta- Meta- 
System system system 


Generate 
Robustness 


Fig. 4. Overview of our approach to compute robustness for the unconstrained envi- 
ronment. The inputs are the LTSs of environment FE, controller C, and property Psaf. 
The set A is the set of all environment transitions, A = Qeg x Actz Xx Qr. The LTSs 
Tı,..., Tn C F represent controlled meta-systems. 


Figure 4 provides an overview of our approach. First, we define LTS E4 to be 
the deviated system with all possible transitions, i.e., A = Qp x Actp x Qg. The 
deviated system F4 is the maximally deviated environment since it encompasses 
every possible deviated system Eg for d C Qp x Acte x Qe. 

Next, we compose the deviated environment Ea with controller C and prop- 
erty Psaf, to create a “meta-system” F. This meta-system provides information 
about how the deviated environment Æa under the control of C can violate Psaf. 
Following this composition, we pose a control problem over the meta-system to 
prevent any violation of P,,¢. There are multiple ways of controlling this com- 
posed system; in our approach, we obtain a finite number of controllers encoded 
as T; C F. These different ways of controlling the meta-system provide different 
robust deviations from which we can extract A. To make our approach concrete, 
we describe each step in detail using our running example, shown in Fig. 2. 


Constructing the Meta-system. The deviated environment Ea = 
EQpxActexQp Contains the behavior of any other deviated environment. There- 
fore, we define the meta-system to be the composition of deviated environment 
E4, controller C, and property Piaf, ie, F = Ea||C||Psar. Figure5a shows 
the meta-system F for our running example. Since C only has one state, we 
omit its state from the state names in Fig. 5a, i.e., states in Fig. 5a are defined as 
(de, dp) E Qe X Qp,,,, instead of (qe, de, dp) E Qe X Qc X Qp,,,. Al transitions in 
F are labeled a, omitted in Fig. 5a, since controller C only enables action a. We 
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also identify in F which transitions are derived from the environment (dashed 
blue) and which are derived from deviations (green). For simplicity, we define a 
single error state in F to capture every (qe, de, err) E Qe X Qo X Qp,,;- 


(c) Meta-controller T2 


Fig. 5. Meta-systems. All transitions have action a since C only enables action a (see 
Fig. 2b). Dashed blue transitions represent transitions that are feasible in Rg while 
solid green transitions represent the deviated transitions in (Qp x Acts x Qz) \ Re. 
The shaded area in Fig. 5b contains all safe states in the meta-system. 


Controlling the Meta-system. Once the meta-system is constructed, we pose 
a meta-control problem over F to ensure that the meta-system avoids the error 
states, i.e., states (qe, qc, err) E Qe X Qc X Qp,,,. These error states represent 
safety violations in the closed-loop system. For instance, in Fig. 5a, if transition 
(2, C) > err occurs, then the closed-loop system violates Pq, since more than 
two actions a were executed. In this meta-control problem, a meta-controller can 
disable transitions in F that originated from deviations in F, i.e., transitions in 
(QE x Actr x Qe) \ Reg. 


Problem 2. Given meta-system F, synthesize a meta-controller T C F such 
that (1) for any (qe,qc:qp) € Qr then state qp # err; and (2) for any 
((de, de; 4p), a, (dh, 41%) € Rp \ Rr such that (qe, qc, qp) € Qr, it follows that 
(qe,a, qe) ¢ Re. 


Problem 2 states that the meta-controller is a subset of the meta-system 
F. We want to maintain the same structure as in F since we need to enforce 
that the meta-controller does not disable any transition associated with Rg. 
Condition (1) in Problem 2 ensures that property P,ay is not violated. On the 
other hand, condition (2) guarantees that only transitions assigned to deviations 
are disabled. 
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Back to our example, the LTS T described by the shaded area in Fig. 5b 
demonstrates a possible meta-controller that satisfies Problem 2. Condition (1) 
is satisfied since the error state is not included in the shaded area. With respect to 
condition (2), only solid green transitions are disabled. Figure 5c shows another 
meta-controller. 

To solve Problem 2, one can solve a safety game over F using fixed-point 
computation [15,25]. Due to space limitations, we point the reader to [27], pg. .23 
for the solution to this safety game. 


Extracting Robust Deviations. Each meta-controller that solves Problem 2 
relates to a robust deviation. Intuitively, a meta-controller disables deviations 
that would violate Ps,f. For instance, the meta-controller T} shown in Fig. 5b 
disables transition (3, B) — (1,C), which relates to disabling transition 3 > 1 
in the environment. Figure 3a depicts the deviated environment related to meta- 
controller Tı. Similarly, Fig. 3b shows the deviated environment associated with 
meta-controller T3. 

To extract a robust deviation from a meta-controller, we have to (1) identify 
the transitions that the meta-controller has disabled; and (2) project the disabled 
transitions to transitions Qg x Actg x Qg. Since a meta-controller is a subset 
of the meta-system, the disabled transitions are obtained by comparing F and 
T. Intuitively, the disabled transitions are those that escape the shaded area in 
Fig. 5. 


Disabled := {(q, a, qd) € Rr |q€Qr A (q, a, q) ¢ Rr} (1) 


For instance, in the case of meta-controller T}, the transition ((1, B), a, (1, C)) 
belongs to the Disabled set. Next, based on the disabled transitions, we project 
them to transitions in Qp x Actg x Qp, i.e., transitions in the environment. 


del := {(qe, a, 4) € Qr x Acte x Qe | ((de, de, Ip), a, (qe, do, qp)) E Disabled} 

(2) 
Transitions in del are the transitions to be deleted from Qp x Actz Xx Qp such 
that (Qp x Actp x Qp) \ del is a robust deviation set. If transitions in del are 
included in a deviation set, they can cause a violation of property Psaf. In the 
case of T}, the transition (1,a, 1) is included in del. If we maintain, for instance, 
transition 1 + 1 as part of a deviation set d, then the closed-loop Eq /C violates 
the property Pay since the path (1,A) — (1,B) — (1,C) — err would be 
feasible in the meta-controller. 


Computing Robustness A. Problem 2 searches for meta-controllers that guar- 
antee the satisfaction of property Psaf.'To compute A, we need to obtain a finite 
number of meta-controllers. Algorithm 1 formalizes our description in Fig. 4. It 
takes as input the environment E, the controller C, a deviation set d, and a 
safety property P. From the algorithm overview description in Fig. 2, we have 
that for the unconstrained environment d = A = Qg X Acte X Qpr and P = Peat. 

In Algorithm 1, line 4 computes the largest possible set of invariant states 
that avoid the error state, i.e., Inv(Qr \ Err) solves the safety game as shown 
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Algorithm 1. COMPUTE-ROBUSTNESS 
Input: LTSs E£, C, P and deviation d 
Output: Set of deviations D 
1: DO 

2: F — Ea||C||P 

3: Err — {(qe, qc; qp) E QF | qp = err} 
4: W — Inv(Qr \ Err) 
5 
6 
7 


: for all S € 2” \ {Ø} do 
3 T — META-CONTROLLER(S, F) 


: del m {(de, a, q) € d | F( (de, de, Ip); a, (dt, des Ip) € Rr \ Rr s.t. (qe, qe, qp) E 
Qr} 

8 De DU {d\ del} 

9: while dd,,dz E€ D s.t. dı C d2 do 


10: D — D\ {di} 
return D 


11: procedure META-CONTROLLER(S, F) 
12: S — Inv(S) 
13: if q0,F ¢ S then 


14: T- 

15: else 

16: Qr- sS, Actr — Actr, qo,T <— 40,F 

17: Rr — {(q,a,q') E S x Actr x S | (q,a, q') € Re} 
return T 


in [27], pg. 23. Based on this invariant set, each iteration in the loop (lines 5-8) 
computes a meta-controller (line 6) and stores its respective robust deviation 
(line 8). The meta-controller T is also computed by using the function Inv. The 
meta-controller solution ensures that Qr C S. Line 7 computes environmental 
transitions that must be deleted in order to obtain a robust deviation. The 
computed robust deviations are stored in A. Lastly, the loop in lines 9-10 ensures 
that only maximal robust deviations are included in A. 

In more detail, to solve Problem 2, we must guarantee that the meta-system 
F does not reach any states in Err := {(qe, qc, dp) E QF | qp = err}. Formally, 
we compute the set Inv(Qr \ Err), which contains every state in F that does 
not reach a state in Err via a transition associated with Rg. Based on this 
invariant set, we can extract any meta-controller that remains within this set. 
Informally, the META-CONTROLLER(S, F) in line 11 of Algorithm 1 computes a 
meta-controller that remains within states in S. First, this procedure computes 
the invariant set of S, i.e., Inv(S) with respect to meta-system F (line 12). In 
this manner, a meta-controller is defined by projecting the meta-system F to 
states and transitions in the set of state Inv(S) (lines 16-17). 

The following theorem shows that A computed via Algorithm 1 is equal to A 
as in Definition 5 when Pen, = Acth, i.e., Algorithm 1 partially solves Problem 1. 


Theorem 1. Given LTS E, controller C, and property Psaz, Algorithm 1 out- 
puts A as in Definition 5 when Peny = Acth. 
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Proof. Sketch. In order to show that Theorem 1 holds, we provide two interme- 
diate lemmas whose proofs are available at [27], pg. 24 (Lemma 2 and Lemma 3). 
The first lemma states that every meta-controller T produces a robust deviation. 
In this manner, we show that for every d € A, the deviation d is robust. The sec- 
ond lemma shows that for every maximal robust deviation d € A, there exists a 
meta-controller T associated with deviation d. Consequently, Algorithm 1 com- 
putes every possible maximal robust deviation. 


Using Algorithm 1 to compute A for our running example, we obtain A that 
contains the three maximal robust deviations shown in Fig. 3. Lastly, we provide 
the computational complexity of Algorithm 1. 


Theorem 2. Algorithm 1 outputs A in O(2|@#!l@cl(12r|-)), 
Proof. It follows from the size of 2”. 


Although Algorithm 1 has exponential complexity, we empirically show in Sect. 6 
that it scales better than the brute-force algorithm. 


Heuristics to Exploit the Structure of F. In Algorithm 1, we compute 
robust deviations for every possible subset of the largest invariant state set, 
c.f., line 5. To improve the efficiency of Algorithm 1, we provide a sound and 
complete heuristic that identifies and skips redundant subsets of 2” \ Ø. The 
heuristic is based on the observation that sets of states that are not directly 
connected in F correspond to redundant deletion sets from Qg x Actg x Qe. 
As such, the heuristic exploits the structure of F by performing a depth-first 
search over its state space, hence skipping disconnected groups of states. For 
instance, the heuristic will skip the subset {(1,A),(3,C)} because (1, A) and 
(3, C) are not connected in F. This subset is redundant because its deletion set 
del = {((1, A), (1, B)), (1, A), (2, B)), ((1, A), (8, B))} is identical to the deletion 
set for the subset {(1,A)} which is connected. In the worst-case scenario, our 
heuristic computes the power set of W, i.e., exactly as in line 5. 


5.3 Controlling the Deviations with Environmental Constraints 


When introducing environmental constraints, we must eliminate the robust devi- 
ations that violate these constraints as described in Definition 5. One might think 
that Peny and P.af could be combined as a single safety property for which we 
then compute A. However, this approach does not work since Peny must be 
enforced only by the environment whereas Paf is a property of the closed- 
loop system. Another approach is to verify if Pen» is satisfied for each deviation 
obtained in the for-loop (lines 5-8) in Algorithm 1. Although this approach is 
feasible, in practice, we want to reduce the number of deviations, using Pnv, 
before we compute the robust deviations. For this reason, we describe a sequen- 
tial algorithm shown in Fig. 6. In this algorithm, Algorithm 1 is used multiple 
times in this constrained scenario instead of a single time as in the unconstrained 
scenario (Sect. 5.2). 


340 R. Meira-Goes et al. 
Part (a) i Output 
| Input: Comp-Rob : 
E, Cait, (E, Catt; Pena A) D F g 7 1, : 
Penu, A Alg.1 a | i = 3 


Fig. 6. Overview of our approach to compute robustness for constrained environments. 


The algorithm to compute robustness for constrained environments can be 
broken into two parts: (a) computing all maximal environments d; that sat- 
isfy Peny; and (b) computing robust deviations for each deviated environment 
Ej, found in part (a). Computing the maximal environments that satisfy Pen» 
reduces to computing maximal deviations of Æ with respect to a controller that 
allows every environment action, Cau. Formally, the behavior of Cau does not 
restrain E, beh(Cau) = Acti; and it can be described by a one-state LTS. There- 
fore, the output of part (a) is the set of maximal deviations d; with respect to 
E, Cau, and Penu, denoted as maximal environment deviations. Each maximal 
deviated environment Ej, satisfy the Peony. 

Once we have obtained all maximal environment deviations that satisfy Penv, 
we focus on finding the maximal robust deviations with respect to C and Pyar. 
In other words, we run Algorithm 1 for each maximal deviated environment Ej, 
together with C and Paf. Since d is a subset of di, we have that the perturbed 
system Ey satisfies Pony. 

Each maximal deviated environment Ej, generates a set of maximal robust 
deviations D; with respect to C and Pat The final step is combining these 
maximal robust deviations with respect to each di. Since they are maximal with 
respect to di, there could be deviations that are not maximal as defined by 
Definition 5. The post-processing step combines the deviations and eliminates 
any non-maximal deviations; and it outputs A as in Definition 5. The correctness 
of this algorithm follows from Theorem 1. 


6 Case Studies 


6.1 Implementation 


We have implemented a prototype tool for computing robustness [28]. The tool 
accepts a model of an environment, a controller, and a safety property—as well 
as an optional list of environmental constraints—and outputs A. The tool has 
support for comparing the robustness of two controllers as well as the robustness 
of a controller with respect to two separate safety properties. Currently, the 
environment, controller, safety property, and environmental constraints must be 
encoded in Finite State Process (FSP) notation [23] but this is not a fundamental 
limitation. 
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Fig. 7. The beam components of the two Therac machines. The hardware interlocks 
cause C{eam to have a fifth state “switching mode” that will only switch to X-ray mode 
after the flattener rotates into place. 


We wrote the tool in the Kotlin programming language. Our tool includes 
an implementation of the brute-force algorithm from Sect.5.1, as well as an 
implementation of Algorithm 1 and Algorithm 1 with heuristics. In the following 
case studies, we leverage the tool to calculate and compare the robustness of 
several systems. We summarize our performance results for each case study in 
Sect. 6.6. 


6.2 Therac-25 


Background. In Sect.2, we introduced the Therac-25 radiation therapy 
machine. In this section, we present a case study in which we compare the 
robustness of the Therac-25 to that of its predecessor, the Therac-20. We begin 
by showing that the Therac-20 is strictly more robust than the Therac-25. We 
then use this information to identify and fix a critical safety bug in the Therac-25 
model. 


Therac-20. The Therac-20 is a radiation therapy machine that was designed 
before the Therac-25. Unlike the Therac-25, the Therac-20 was not known for 
causing accidents that led to injuries and death. A key difference between the 
two machines is that the Therac-20 includes hardware interlocks in its beam 
component (Fig.7a), while the Therac-25 does not (Fig.7b). The purpose of 
the hardware interlocks is to provide a layer of security at the hardware level for 
upholding P, fiat. In our model, the interlocks work by ensuring that the flattener 
is completely rotated into place before allowing an operator to fire an X-ray 
beam. Unfortunately, hardware interlocks were considered expensive so they were 
omitted from the design of the later Therac-25 model. In the following section, 
we compare the robustness between the two Therac machines with respect to 
the normative environment E and the key safety property Py fiat. 


Comparing Controllers. Using standard model checking techniques [2], we 
can confirm that both the Therac-20 and the Therac-25 are safe with respect 
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Fig. 8. Visual robustness comparison Fig. 9. Software fix that eliminates the 
between the two Therac machines. race condition in the Therac-25. 
Both machines are robust against gray 
transitions, but only the Therac-20 is 


robust against green transitions. (Color 
figure online) 


to E and Py fiat. Historically, however, the Therac-20 is known to be safer than 
the Therac-25. Therefore, we improve our safety analysis by also comparing 
the robustness between the two machines with respect to E, Prifat, and an 
environmental constraint Pony. Peny, shown in [27], pg. 26, Fig. 11, restricts the 
environment to firing the beam at most once. 

Our tool reports that the Therac-20 is strictly more robust than the Therac- 
25. To understand this result, we can examine the difference between the robust- 
ness for each machine. We show this difference visually by presenting one max- 
imal robust deviation from each machine in Fig. 8. This figure shows that the 
Therac-20 is robust against the scenario in which the operator 1) types “e” to 
select electron beam mode, 2) optionally types “enter”, 3) presses the “up” arrow 
key, and finally 4) types “x” to switch the beam into X-ray mode. The Therac- 
25, however, is not robust against this scenario. We see this in Fig. 8 because 
the series of actions must pass through at least one green arrow, where a green 
arrow indicates a transition that the Therac-25 is not robust against. In fact, the 
Therac-25 does not have any maximal robust deviations that allow this scenario. 

The Therac-25’s lack of robustness to the scenario above represents a race 
condition that occurs after the operator switches into X-ray mode from electron 
mode. In this scenario, if the operator types “enter” and fires the X-ray beam 
before the flattener rotates into place, the beam will fire an unflattened X-ray at 
the patient. This critical bug was responsible for real-world radiation overdoses, 
several of which resulted in death [18]. 


Fixing the Software Bug. In the previous section, we identified a critical 
software bug in the Therac-25. Our goal in the current section is to fix this bug 
entirely in the terminal software, thus avoiding an expensive hardware solution. 

In Fig. 7a, we see that the hardware interlocks prevent a race condition by 
blocking the operator from typing a “b” until the flattener is rotated into place. 
Thus we can fix the race condition in software by altering the terminal to block 
the operator from typing a “b” until the flattener is rotated into place. We 
implement this fix by redesigning the terminal to block all key strokes from 
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the instant it issues a “beam ready” message until the turntable rotates into 
place, as shown in Fig. 9. Finally, we use our tool to evaluate the robustness of 
the fix. The tool reports that the fixed Therac-25 design is strictly more robust 
than the original, and equally robust to the Therac-20. 


6.3 Voting 


Background. In this section, we consider a case study of an electronic voting 
machine, introduced in [46]. In this case study, we model the voting machine, a 
voter, and a corrupt election official who attempts to “flip” the voter’s choice. 
We define the voting machine as the composition of a voting booth and a user 
interface, shown at [27], pg. 26 in Fig. 12a and Fig. 12b respectively. 

In the normative environment-shown in Fig. 10a-the voter enters the booth, 
enters their password, selects a candidate, clicks the vote button, and finally 
confirms the choice. Unfortunately, some voters may inadvertently skip the con- 
firmation step and leave the booth early. This deviation from the normative 
behavior presents an opportunity for the election official to “flip” the intended 
vote: after the voter leaves the booth, the corrupt official can enter the booth, 
press “back” and change the vote to their liking. This scenario represents an 
actual election fraud that took place in the US [38]. 


select 


(a) Normative environment for the voting (b) The voting machine’s robustness is identical 
machine. with respect to Pau and Perm. 


Fig. 10. Models for the voting machine example. In the figures above, the prefix “v” 
represents actions by the voter. 


Comparing Properties. In this case study, we will consider two safety prop- 
erties, Pau and P.fm, both of which imply the absence of vote flipping. Pau 
requires that the election official cannot at any point select, vote, or confirm a 
candidate. Perm is weaker, only requiring that the election official cannot at any 
point confirm a candidate selection. 
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Using our tool for comparison, we see that the voting machine is equally 
robust with respect to each property. However, this result is surprising because 
Pefm is weaker than Pau. To understand this result, we examine Fig. 10b where 
we present the sole maximal robust deviation for each property. In this figure, it 
is clear that the voting machine is not robust against any deviation in which the 
voter enters their password and then exits the booth without confirming their 
vote. The key insight is that, when an election official has the ability to confirm, 
it implies that the official can also select and vote. Therefore, we desire a voting 
machine without this implication because it will reduce the number of points of 
failure. For example, we could redesign the voting machine to require a password 
as part of the confirmation step. In lieu of this insight, a designer could choose 
to specify a margin of safety into the machine’s specification by requiring that 
it is strictly more robust against P.fm than Pay. 


6.4 Oyster 


Background. The Oyster example was introduced in [41], in which the authors 
modeled the Oyster card that is used the public transportation system in the 
United Kingdom. In our model, the controller consists of an entry gate and an 
exit gate, where the card holder taps the Oyster card at the start and end of 
their journey respectively. The environment models the actions of a card holder; 
in the normative environment, a card holder chooses to tap with either their 
Oyster card or a credit card, and taps in and out with the chosen card. The key 
safety property is avoiding an incomplete journey, in which a card holder taps 
in with one card and taps out with a different card. 


Calculating Robustness. An incomplete journey is avoided under the nor- 
mative environment. We calculate the robustness of the system under the two 
environmental constraints 1) Oyster cards and credit cards give the correct infor- 
mation to the gates and 2) the gates operate correctly and calculate the correct 
fare when a card is tapped in and out. Unfortunately, the system is not robust 
to any deviations. 


6.5 PCA Pump 


Background. In this section, we model a patient-controlled analgesia (PCA) 
pump, originally introduced in [5]. A PCA pump is a medical device that dis- 
penses pain medicine to a patient, offering them partial control over the dose 
rate. A nurse uses the device interface to program the volume per dosage, as well 
as a minimum and maximum dose rate to protect the patient from an overdose. 
The pump includes batteries to power the device in case it is unplugged (e.g., by 
mistake by the nurse or patient), yet the power may fail if the device runs out of 
battery. In this case, the device cannot monitor the dosage amount or frequency, 
which may cause an overdose. Therefore, we define the key safety property Pp fait 
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which requires the PCA pump to abstain from administering medicine after a 
power failure. 

In the normative environment, the nurse operates the pump using the fol- 
lowing three step workflow: 1) plug in the pump and turn it on, 2) program the 
desired dosage parameters into the pump and administer the treatment, and 3) 
turn off the device and unplug it. The nurse begins with step (1) and ends with 
step (3), but may omit or repeat step (2) as many times as needed. A diagram 
of the normative environment is available at [27], pg. 26, Fig. 13. Crucially, the 
pump is safe with respect to this environment and Py faij because the workflow 
assumes that the pump is never unplugged in step (2). 


Calculating Robustness. We use our tool to calculate the robustness of the 
pump with respect to the normative environment, P, fai, and an environmental 
constraint Penu. In this case study, Peny restricts the environment to actions 
that are allowed by the pump’s interface. A diagram of the sole maximal robust 
deviation is available at [27], pg. 27, Fig. 14. The tool reports that the pump is 
robust against four actions, three of which allow the operator to change settings 
before administering the treatment, and the fourth allows the operator to turn 
off the device prematurely after programming the dosage parameters. Unfortu- 
nately, the pump is not robust against any deviations in which it is unexpectedly 
unplugged. This poses a key weakness in the pump that the designers may wish 
to improve upon. 


6.6 Results and Discussion 


We have run our tool on the examples and case studies above, and we present 
our results in Table 1. All tests were run on a Mac Book Pro with an M1 Pro 
chip and 32GB of RAM. In the table, | Act] is the union of Actgz, Acto, Actp,,; 
and Actp,,,, |dmaz| is the size of the largest deviation in A, and |Wp,,,,| is the 
size of the winning set for each maximal deviation d; (separated by a comma); 
NA indicates the absence of an environmental constraint. Furthermore, “Wall 
Heur” denotes the wall time for running Algorithm 1 with the heuristic, while 
“Wall Plain” denotes the wall time for running Algorithm 1, and “TO” indicates 
a time-out after five minutes. 

Our results demonstrate that calculating robustness is tractable across sev- 
eral different case studies. In particular, our tool’s performance on the larger 
PCA pump case study shows promising results in terms of scalability. Further- 
more, we have shown that A is useful as a means for both analysis and compari- 
son of controllers. For example, in the Therac-25 case study, robustness provided 
a richer analysis than classic verification that helped us discover—and ultimately 
fix—a critical race condition. Finally, we have also demonstrated in the voting 
machine case study that robustness provides a means for comparing two prop- 
erties with respect to a controller and an environment. 
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Table 1. Summary of results from running our tool. 


Example |Act| ||Qz| |Qc]||Qp|||W| ||We....| |A] | |dmaz| | Wall Heur Wall Plain 
Running Example, 2 4 2 |4 6| NA 3 |13 0.433 s 0.431 sec 
Therac-25 w/bug | 9 5 21 |5 62 | 28,30,31,37 | 4 21 4.921 sec | TO 
Therac-25 w/fix 9 |5 |19 [5 72 | 18,20,23,25|4 |26 0.852 sec | TO 
Therac-20 9 |5 uls 40 | 17,19,21,23/4 | 26 0.626 sec | TO 
Voting wrt. Pefm | 9 7 13 |3 66 | 7 1 |12 0.469 sec | TO 
Voting wrt. Pau 9 7 13 |3 66 | 7 1 |12 0.426 sec | TO 
Oyster 8 4 17 |2 15/8 1 4 0.472 sec | TO 

PCA Pump 21 11 105 |4 1396 | 34 1 15 1.922 sec | TO 


7 Related Work 


Quantitative robustness notions for discrete transition systems have been inves- 
tigated in several works [3,4,8,16,24,32,40,42]. We capture robustness qualita- 
tively, which avoids the need for external cost functions over the discrete tran- 
sition systems. The problem of synthesizing robust controllers against deviated 
environments given by a designer is investigated in [45]. Since [45] focuses on 
synthesizing robust controllers, their framework does not address the analysis 
of robustness. Moreover, robust controllers are measured via a rank function 
(quantitatively). Robust linear temporal logic (rLTL) extends the binary view 
of LTL to a 5-valued semantics to capture different levels of property satisfaction 
[43]. This work is tangent to ours as it focuses on specifying robustness. 

In [17,49], the authors define robustness as a set of environmental behav- 
iors for which a software system can guarantee safety. Defining robustness in 
the semantic domain-i.e. in terms of behaviors—implicitly describes safe environ- 
mental deviations. Our notion of robustness captures safe environmental devia- 
tions explicitly in terms of transitions, which offer both syntactic (transitions) 
and semantic (implied behaviors) information. Transition-based robustness also 
allows us to capture the safe environmental envelopes of a system; it is not clear 
how one might efficiently capture this information with only behaviors. 

In [29], the authors define robustness also based on additional transitions 
to the environment. Their definition of robustness compares the perturbed con- 
trolled behavior, i.e., beh(Eg|f), instead of directly comparing the additional 
transitions. In this manner, the partial order used to define robustness in [29] 
is different from our notion of robustness. Moreover, only an efficient algorithm 
for invariance properties is presented. Extending the work in [29], the authors 
explore the relationship between controller robustness and permissiveness for 
invariance properties [30]. 

Robust control in discrete event systems is also an active area of research 
[1, 10, 19-21, 26,31,33,39,44,47,48]. However, they usually deal with specific 
types of faults such as communication delays, loss of information, or deception 
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attacks [1,20,21,26,31,39,47]. We capture model uncertainty with our robust- 
ness definition, which can be attributed to these faults. Robustness against model 
uncertainty is tackled in the works of [10,19,44,48]. In these works, deviations 
are modeled by the behavior generated by the environment. On the other hand, 
we modeled deviations by the inclusion of extra transitions. In [11], a controller 
realizability problem is studied for environments modeled as modal transition 
systems, where a controller satisfies a property in all, some, or none of the LTS 
family. Our notion of robustness explicitly computes which systems in the LTS 
family satisfy the property. 

Lastly, robustness also relates to fault-tolerance. Fault-tolerance has been 
studied in the context of distributed systems [13, 22,34]. In [6,9,12, 14], synthesis 
of fault-tolerant programs by retrofitting initial fault-intolerant programs. These 
works focus on specific types of fault models, whereas our robustness model 
computes the safety envelope the controller is robust against. 


8 Conclusion 


In this paper, we introduced a new notion of robustness against environmen- 
tal deviations for discrete-state transition systems. Our notion of robustness is 
syntactically defined by additional transitions and semantically defined by the 
controlled behavior generated by these additional transitions. We provided two 
methods to compute robustness: a brute-force algorithm, and an algorithm based 
on a controller synthesis problem. We implemented these methods in a proto- 
type tool which we used to analyze several case studies. In these case studies, 
we demonstrated that our robustness analysis provides crucial information by 
identifying the environmental envelopes in which the system can guarantee its 
safety properties. 

As part of future work, we plan to extend our work to investigate robustness 
in the context of partially observable systems as well as in stochastic systems such 
as Markov decision processes (MDPs). We also plan to investigate the benefit 
of considering additional environmental states—as well as additional transitions— 
in our robustness analysis. Finally, we plan to extend our work beyond safety 
properties, e.g. including liveness. 
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Abstract. We present the Verse library with the aim of making hybrid 
system verification more usable for multi-agent scenarios. In Verse, deci- 
sion making agents move in a map and interact with each other through 
sensors. The decision logic for each agent is written in a subset of Python 
and the continuous dynamics is given by a black-box simulator. Multiple 
agents can be instantiated, and they can be ported to different maps 
for creating scenarios. Verse provides functions for simulating and veri- 
fying such scenarios using existing reachability analysis algorithms. We 
illustrate capabilities and use cases of the library with heterogeneous 
agents, incremental verification, different sensor models, and plug-n-play 
subroutines for post computations. 
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1 Introduction 


Automatic verification tools for hybrid systems have been used to analyze linear 
models with thousands of continuous dimensions [1,5,6] and nonlinear models 
inspired by industrial applications [6,14]. The state of the art and the chal- 
lenges are discussed in a recent survey [11]. Despite the potentially large user 
base, currently this technology is inaccessible without formal methods training. 
Automatic hybrid verification tools [10,13,17,25,31] require the input model to 
be written in a tool-specific language. Tools like C2E2 [15] attempt to trans- 
late models from Simulink/Stateflow, but the language-barrier goes down to the 
underlying math models. The verification algorithms are based on variants of the 
hybrid automaton [3,21,24] which requires the discrete states (or modes) to be 
spelled out explicitly as a graph, with guards and resets labeling the transitions. 
We discuss related works in more detail in Sect. 6, including recently developed 
libraries that address usability barrier [5,7, 8]. 
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In this paper, we present Verse, a Python library that aims to make hybrid 
technologies more usable for multi-agent scenarios. The key features imple- 
mented are as follows: (1) In Verse, users write scenarios in Python. User-defined 
functions can be used to create complex agents, invariant requirements can be 
written as assert statements, and scenarios can be created by instantiating mul- 
tiple agents, all using the standard Python syntax. Verse parses this scenario 
and constructs an internal representation of the hybrid automaton for simula- 
tion and analysis. (2) Verse introduces an additional structure, called map, for 
defining the modes and the transitions of a hybrid system. Map contains tracks 
that can capture geometric objects (e.g., lanes or waypoints) that make it possi- 
ble to create new scenarios just by instantiating agents on new maps. With track 
modes, users do not have to explicitly write different modes for a vehicle following 
different waypoint segments. Finally, (3) Verse comes with functions for simula- 
tion and safety verification via reachability analysis. Developers can implement 
new functions, plug-in existing tools, or implement advanced algorithms, e.g., for 
incremental verification. In this tool paper, we illustrate use cases with heteroge- 
neous agents and different scenario setups, the flexibility of plugging in different 
reachability algorithms and the ability to develop more advanced algorithms 
(Sect. 5). Verse is available at https: //github.com/AutoVerse-ai/Verse-library. 


2 Overview of Verse 


We will highlight the key features of Verse with an example. Consider two drones 
flying along three parallel co-shaped tracks that are vertically separated in space 
(shown by black lines in Fig. 1). Each drone has a simple collision avoidance 
logic: if it gets too close to another drone on the same track, then it switches to 
either the track above or the one below. A drone on T1 has both choices. Verse 
enables creation, simulation, and verification of such scenarios using Python, and 
provides a collection of powerful functions for building new analysis algorithms. 


Fig. 1. Left: A 3-d co-shaped map with example track mode labels. Center: Simulation 
of a red drone nearing the blue drone on T1 and nondeterministically moving to TO or T2. 
Both branches are computed by Verse’s simulate function. Right: Computed reachable 
sets of the two drones cover more possibilities: either drones can switch tracks when 
they get close. All four branches are explored by Verse. The branch for blue drone 
moving downwards violates safety as it may collide with the red drone following T1. 


Creating Scenarios. Agents like the drones in this example are described by 
a simulator and a decision logic in an expressive subset of Python (see code 
in Fig.2 and [26] for more details). The decision logic for an ego agent takes 
as input its current state and the (observable) states of the other agents, and 
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updates the discrete state or the mode of the ego agent. For example, in lines 41- 
43 of Fig.2 an agent updates its mode to begin a track change if there is any 
agent near it. It may also update the continuous state of the ego agent. The 
mode of an agent, as we shall see later in Sect. 3, has two parts—a tactical mode 
corresponding to agent’s decision or discrete state, and a track mode that is 
determined by the map. Using the any and all functions, the agent’s decision 
logic can quantify over other agents in the scene. User defined functions are 
also allowed (is_close, Fig.2 line 41). Verse will parse this decision logic to 
create an internal representation of the transition graph of the hybrid model 
with guards and resets. The simulator can be written in any language and is 
treated as a black-box!. For the examples discussed in this paper, the simulators 
are also written in Python. Safety requirements can be specified using assert 
statements (see Fig. 5). 


38 def decisionLogic(ego: State, others: List[State], track_map): 


39 next = copy.deepcopy (ego) 

40 if ego.tactical_mode == TacticalMode.Normal: 

Al if any((is_close(ego, other) and ego.track_mode==other.track_mode) for other in 
<— others): 

42 next.tactical_mode = TacticalMode.MoveDown 

43 next.track_mode = track_map.Tg(ego.track_mode, ego.tactical_mode, 

= TacticalMode.MoveDown) 

44 if any((is_close(ego, other) and ego.track_mode==other.track_mode) for other in 
<— others): 

45 next.tactical_mode = TacticalMode.MoveUp 

46 # Gian 

47 if ego.tactical_mode == TacticalMode.MoveUp: 

48 if in_interval(track_map.altitude(ego.track_mode)-ego.z, -1, 1): 

49 next.tactical_mode = TacticalMode.Normal 

50 next.track_mode = track_map.Tg(ego.track_mode, ego.tactical_mode, 


< TacticalMode.Normal) 


Fig. 2. Decision Logic Code Snippet from drone_controller.py. 


Maps and Sensors. The map of a scenario specifies the tracks that the agents can 
follow. While a map may have infinitely many tracks, they fall in a finite number 
of track modes. For example, in this oo-shaped map, each layer is assigned to a 
track mode (TO-2) and all the tracks between each pair of layers are also assigned 
to a track mode (M10, M01 etc.). When an agent makes a decision and changes 
its tactical mode, the map object determines the new track mode for the agent. 
The map abstraction makes scenarios succinct and enables portability of agents 
across different maps. Besides creating from scratch, Verse provides functions 
for generating map objects from OpenDRIVE [4] files. 


1 This design decision for Verse is relatively independent. For reachability analysis, 
Verse currently uses black-box statistical approaches implemented in DryVR [14] and 
NeuReach [35]. If the simulator is available as a white-box model, such as differential 
equations, then Verse could use model-based reachability analysis. 
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The sensor function defines which variables from an agent are visible to 
other agents. The default sensor function allows all agents to see all variables; 
we discuss how the sensor function can be modified to include bounded noise in 
Sect.5. A map, a sensor and a collection of (compatible) agents together define 
a scenario object (Fig.3). In the first few lines, the drone agents are created, 
initialized, and added to the scenario object. A scenario can have heterogeneous 
agents with different decision logics. 


32 scenario = Scenario() 

33 drone_red = DroneAgent (’drone_red’ , file_name=’drone_controller.py’) 
34 drone_red.set_initial([init_l1_1, init_u_1], (CraftMode.Normal, TrackMode.T1)) 
35 scenario.add_agent (drone_red) 

36 drone_blue = DroneAgent(’drone_blue’, file_name=’drone_controller.py’) 
37 scenario.add_agent (drone_blue) 

38 r Mes 

39 scenario.set_map(M6()) 

40 scenario.set_sensor (BaseSensor()) 

Al #traces = scenario.simulate(40, time_step) 

42 traces = scenario.verify(40, time_step) 


Fig. 3. Scenario specification snippet. 


Simulation and Reachability. Once a scenario is defined, Verse’s simulate func- 
tion can generate simulation(s) of the system, which can be stored and plotted. 
As shown in Fig. 1(Center), a simulation from a single initial state explores all 
possible branches that can be generated by the decision logics of the interact- 
ing agents, upto a specified time horizon. Verse verifies the safety assertions 
of a scenario by computing the over-approximations of the reachable sets for 
each agent, and checking these against the predicates defined by the assertions. 
Figure 1(Right) visualizes the result of such a computation performed using the 
verify function. In this example, the safety condition is violated when the blue 
drone moves downward to avoid the red drone. The other branches of the sce- 
nario are proved to be safe. The simulate and verify functions save a copy 
of the resulting execution tree, which can be loaded and traversed to analyze 
the sequences modes and states that leads to safety violations. Verse makes it 
convenient to plug in different reachability subroutines. It also provides power- 
ful functions to implement advanced verification algorithms, such as incremental 
verification. 


3 Scenarios in Verse 


A scenario in Verse is specified by a map, a collection of agents in that map, 
and a sensor function that defines the part of each agent visible to other agents. 
We describe these components below, and in Sect. 4, we will discuss how they 
formally define a hybrid system. 


Tracks, Track Modes, and Maps. A workspace W is an Euclidean space 
in which the agents reside (For example, a compact subset of R? or R*). An 
agent’s continuous dynamics makes it roughly follow certain continuous curves 
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in W, called tracks, and occasionally the agent’s decision logic changes the track. 
Formally, a track is simply a continuous function w : [0,1] — W, but not all such 
functions are valid tracks. A map M defines the set of tracks Rm it permits. 
In a highway map, some tracks will be aligned along the lanes while others will 
correspond to merges and exits. 

We assume that an agent’s decision logic does not depend on exactly which of 
the infinitely many tracks it is following, but instead, it depends only on which 
type of track it is following or the track mode. In the example in Sect.2, the 
track modes are TO, T1, M01, etc. Every (blue) track for transitioning from point 
on TO to the corresponding point on T1 has track mode M01. A map has a finite 
set of track modes Lm and a labeling function Vm : Rm —> Lm that maps the 
track to a track mode. It also has a mapping gm : W x Lm > Nm that maps 
a track mode and a specific position in the workspace to a specific track. 

Finally, a Verse agent’s decision logic can change its internal mode or tac- 
tical mode P (E.g., Normal to MoveUp). When an agent changes its tactical 
mode, it may also update the track it is following and this is encoded in the 
track graph function: Tg : Lm x P x P — Lm which takes the current 
track mode, the current and the next tactical mode, and generates the new 
track mode the agent should follow. For example, when the tactical mode of 
a drone changes from Normal to MoveUP while it is on T1, this map function 
Ty (T1, Normal, MoveUp) = M10 informs that the agent should follow a track 
with mode M10. These sets and functions together define a Verse map object 
M = (Lm, Vm, gm, Tg m). We will drop the subscript M when the map being 
used is clear from context. 


Agents. A Verse agent is defined by modes and continuous state variables, a 
decision logic that defines (possibly nondeterministic) discrete transitions, and a 
flow function that defines continuous evolution. An agent A is compatible with a 
map M if the agent’s tactical modes P are a subset of the allowed input tactical 
modes for T}. This makes it possible to instantiate the same agent on different 
compatible maps. The mode space for an agent instantiated on map M is the set 
D = L x P, where L is the set of track modes in M and P is the set of tactical 
modes of the agent. The continuous state space is X = W x Z, where W is the 
workspace (of M) and Z is the space of other continuous state variables. The 
(full) state space is the Cartesian product Y = X x D. In the two-drone example 
in Sect. 2, the continuous states variables are the positions and velocities along 
the three axes of the workspace. The modes are (Normal, T1), (MoveUp, M10), etc. 

An agent A in map M with k —1 other agents is defined by a tuple A = 
(Y,Y°,G, R, F), where Y is the state space, Y? C Y is the set of initial states. 
The guard G and reset R functions jointly define the discrete transitions. For 
a pair of modes d,d’ € D, G(d,d’) C X* defines the condition under which a 
transition from d to d’ is enabled. The R(d,d’) : X* — X function specifies how 
the continuous states of the agent are updated when the mode switch happens. 
Both of these functions take as input the sensed continuous states of all the other 
k—1 agents in the scenario. The G and the R functions are not defined separately, 
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but are extracted by the Verse parser from a block of structured Python code as 
shown in Fig. 2. The discrete states in if conditions and assignments define the 
source and destination of discrete transitions. if conditions involving continuous 
states define guards for the transitions and assignments of continuous states 
define resets. Expressions with any and a11 functions are unrolled to disjunctions 
and conjunctions according to the number of agents k. 

For example in Fig.2, Lines 47-50 define transitions (MoveUp,M10) to 
(Normal,TO) and (MoveUp,M21) to (Normal,T1). The change of track mode is 
given by the T% function. The guard for this transition comes from the if con- 
dition at Line 48, G((MoveUp, M10), (Normal, T0O)) = {a | —1 < TO.pz—a.pz < 1} 
for x € X given by user defined in_interval function. Here continuous states 
remain unchanged after transition. 

The final component of the agent is the flow function F : X x D x R2° + X 
which defines the continuous time evolution of the continuous state. For any 
initial condition (x°,d°) € Y, F(x°,d°)(-) gives the continuous state of the 
agent as a function of time. In this paper, we use F as a black-box function (see 
Footnote 1). 


Sensors and Scenarios. For a scenario with k agents, a sensor function S : 
Yt — Y* defines the continuous observables as a function of the continuous 
state. For simplifying exposition, in this paper we assume that observables have 
the same type as the continuous state Y, and that each agent 7 is observed by 
all other agents identically. This simple, overtly transparent sensor model, still 
allows us to write realistic agents that only use information about nearby agents. 
In a highway scenario, the observable part of agent 7 to another agent 7 may be 
the relative distance yj = £j — £;i, and vice versa, which can be computed as a 
function of the continuous state variables x; and a;. A different sensor function 
which gives nondeterministic noisy observations, appears in Sect. 5. 

A Verse scenario SC is defined by (a) a map M, (b) a collection of k agent 
instances {41... Ap} that are compatible with M, and (c) a sensor S for the k 
agents. Since all the agents are instantiated on the same compatible map M, 
they share the same workspace. Currently, we require agents to have identical 
state spaces, i.e., Y; = Yj, but they can have different decision logics and different 
continuous dynamics. 


4 Verse Scenario to Hybrid Verification 


In this section, we define the underlying hybrid system H(SC), that a Verse sce- 
nario SC specifies. The verification questions that Verse is equipped to answer 
are stated in terms of the behaviors or executions of H(S'C). Verse’s notion of 
a hybrid automaton is close to that in Definition 5 of [14]. The only uncom- 
mon aspect in [14] is that the continuous flows may be defined by a black-box 
simulator functions, instead of white-box analytical models (see Footnote 1). 

Given a scenario with k agents SC = (M, {Aj,...Ax},S, P), the correspond- 
ing hybrid automaton H(SC) = (X,X°,D,D°,G,R,TL), where 
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1. X := [ [; X; is the continuous state space. An element x € X is called a state. 
X°:= II, X? © X is the set of initial continuous states. 

2. D := []; Di is the mode space. An element d € D is called a mode. D’ := 
II, D} C D is the finite set of initial modes. 

3. For a mode pair d,d’ € D, G(d,d’) C X defines the continuous states from 
which a transition from d to d’ is enabled. A state x € G(d,d’) iff there 
exists an agent i € {1,...,k}, such that x; € G;(d;, d;) and dj = d} for j # i. 

4. For a mode pair d,d’ € D, R(d,d’) : X — X defines the change of contin- 
uous states after a transition from d to d’. For a continuous state x € X, 
R(d, d’)(x) = R;(d;, d;)(x) if x € G;(d;, dj), otherwise = x;. 

5. TL is a set of pairs (€,d), where the trajectory £ : [0,T] — X describes the 
evolution of continuous states in mode d € D. Given d € D,x® € X, £ should 
satisfy Vt € R°, €;(t) = Fi(x?, di)(t). 


We denote by €.fstate, €..state, and €.ltime the initial state €(0), the last 
state €(T), and €.ltime = T. For a sampling parameter ô > 0 and a length m, 
a 0-execution of a hybrid automaton H = H(SC) is a sequence of m labeled 
trajectories a := (€°,d°),..., (€"—-!,d™~*), such that (1) €°.fstate € X°,d° € 
D°, (2) For each i € {1,...,m — 1}, &.lstate € G(d',d‘t’) and é*+!. fstate = 
R(d‘,d’*)(€.lstate), and (3) For each i € {1,...,m — 1}, €'ltime = ô for 
i m -— 1 and €'.ltime < 6 for i =m — 1. 

We define first and last state of an execution a = (€9,d°),..., (€"~!,d™~*) 
as a.fstate = €°.fstate, a.lstate = €™~1.1state and the first and last mode 
as a.fmode = d° and a.lmode = d™~'. The set of reachable states is defined 
by Reachy := {a.lstate | a is an execution of H}. In addition, we denote the 
reachable states in a specific mode d € V as Reachy(d) and Reachy(T) to be 
the set of reachable states at time T. Similarly, denoting the unsafe states for 
mode d as U(d), the safety verification problem for H can be solved by checking 
whether Vd € D, Reachy(d) N U(d) = Ø. Next, we discuss Verse functions for 
verification via reachability. 


Verification Algorithms in Verse. The Verse library comes with several 
built-in verification algorithms, and it provides functions that users can use 
to implement powerful new algorithms. We describe the basic algorithm and 
functions in this section. 

Consider a scenario SC with k agents and the corresponding hybrid automa- 
ton H(SC). For a pair of modes, d,d’ the standard discrete posta : X > X 
and continuous postas : X — X operators are defined as follows: For any 
state x,x’ € X, posty q(x) = x’ iff x € G(d,d’) and x’ = R(d,d’)(x); and, 
posta 5(x) =x’ iff Vi € 1,...,k, x) = Fi(xi, di, ô). These operators are also lifted 
to sets of states in the usual way. Verse provides postCont to compute postg 5 and 
postDisc to compute post, q. Instead of computing the exact post, postCont 
and postDisc compute over-approximations using improved implementations of 
the algorithms in [14]. Verse’s verify function implements a reachability analy- 
sis algorithm using these post operators. The algorithm constructs an execution 
tree Tree = (V, E) up to depth m in breadth first order. Each vertex (S,d) € V 
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is a pair of a set of states and a mode. The root is (X°,d°). There is an edge from 
(S,d) to (S',d’), iff S’ = posta 5(postg q/(S)). The safety conditions are checked 
when the tree is constructed. Currently, Verse implements only bounded time 
reachability, however, basic unbounded time analysis with fixed-point checks 
could be added following [14,32]. 


5 Experiments and Use Cases 


We evaluate key features and algorithms in Verse through examples. We consider 
two types of agents: a 4-d ground vehicle with bicycle dynamics and the Stanley 
controller [22] and a 6-d drone with a NN-controller [23]. Each of these agents 
can be fitted with one of two types of decision logic: (1) a collision avoidance 
logic (CA) by which the agent switches to a different available track when it 
nears another agent on its own track, and (2) a simpler non-player vehicle logic 
(NPV) by which the agent does not react to other agents (and just follows its 
own track at constant speed). We denote the car agent with CA logic as agent 
C-CA, drone with NPV as D-NPV, and so on. We use four 2-d maps (M1-4) 
and two 3-d maps M5-6. M1 and M2 have 3 and 5 parallel straight tracks, 
respectively. M3 has 3 parallel tracks with circular curve. M4 is imported from 
OpenDRIVE. M6 is the figure-8 map used in Sect. 2. 


Safety Analysis with Multiple Drones in a 3-d Map. The first example is a 
scenario with two drones—D-CA agent (red) and D-NPV agent (blue)—in map 
M5. The safety assertion requires agents to always separate by at least 1m. 
Figure 4(left) shows the computed reachable set, its projection on 2-position, 
and on z position. Since the agents are separated in space-time, the scenario is 
verified safe. These plots are generated using Verse’s plotting functions. 
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Fig. 4. Left to right: (1) Computed reachtubes for a 2-drone scenario; (2) same reach- 
tube projected on x-dimension, and (3) on z-dimension. Since there is no overlap in 
space-time, no collision. (4) Reachtube for a 3-drone scenario, the red drone violates 
the safety condition by entering the unsafe region after moving downward. (Color figure 
online) 


Checking Multiple Safety Assertions. Verse supports multiple safety assertions 
specified using assert statements. For example, the user can specify unsafe 
regions (Line 77-78) or safe separation between agents (Line 79-82) as shown 
in Fig.5. We add a second D-NPV to the previous scenario and both safety 
assertions. The result is shown in the rightmost Fig. 4. In this scenario, D-CA 
violates the safety property by entering the unsafe region after moving downward 
to avoid collision. The behavior of D-CA after moving upward is not influenced. 
There is no violation of safe separation. Verse allow users to extract the set of 
reachable states and mode transitions that leads to a safety violation. 
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ttt assert not (ego.x > 40 and ego.x < 50 and \ 

78 ego.y > -5 and ego.y < 5 and ego.z > -10 and ego.z < -6), "Unsafe Region" 
79 assert not any(ego.x-other.x < 1 and ego.x-other.x > -1 and \ 

80 ego.y-other.y < 1 and ego.y-other.y > -1 and \ 

81 ego.z-other.z < 1 and ego.z-other.z > -1 \ 

82 for other in others), "Safe Separation" 


Fig. 5. Safety assertions for three drone scenario. 


Changing Maps. Verse allows users to easily create scenarios with different maps 
and port agents across compatible maps. We start with a scenario with one C-CA 
agent (red) and two C-NPV agents (blue, green) in M1. The safety assertion 
is that the vehicles should be at least 1m apart in both x and y-dimensions. 
Figure 6(left) shows the verification result and safety is not violated. However, if 
we switch to map M3 by changing one line in the scenario definition, a reacha- 
bility analysis shows that a safety violation can happen after C-CA merges left 
Fig. 6(center). In addition, Verse allows importing map from OpenDRIVE [4] 
format. An example is included in the extended version of the paper [26]. 
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Fig. 6. Left: running the three car scenario on map with parallel straight lanes. Center: 
same scenario with a curved map. Right: same scenario with a noisy sensor. (Color 
figure online) 


Adding Noisy Sensors. Verse supports scenarios with different sensor functions. 
For example, the user can create a noisy sensor function that mimics a realistic 
sensor with bounded noise. Such sensor functions are easily added to the scenario 
using the set_sensor function. 

Figure 6(right) shows exactly the same three-car scenario with a noisy sensor, 
which adds +0.5m noise to the perceived position of all other vehicles. Since 
the sensed values of other agents only impacts the checking of the guards (and 
hence the transitions) of the agents, Verse internally bloats the reachable set of 
positions for the other agents by +0.5 while checking guards. Compared with the 
behavior of the same agent with no sensor noise (shown in yellow in Fig. 6(right)), 
the sensor noise enlarges the region over which the transition can happen, causes 
enlarged reachtubes for the red agent. 


Plugging in Different Reachability Engines. With a little effort, Verse allows 
users to plug in different reachability tools for the postCont computation. The 
user will need to modify the interface of the reachability tool so that given a 
set of initial states, a mode, and a non negative value 6, the reachability tool 
can output the set of reachable states over a 6-period represented by a set of 
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timed hyperrectangles. Currently, Verse implements computing postCont using 
DryVR [14], NeuReach [35] and Mixed Monotone Decomposition [12]. A scenario 
with two car agents in map M1 verified using NeuReach and DryVR is included 
in the extended version of the paper [26]. 


Incremental Verification. We implemented an incremental verification algorithm 
in Verse called verifyInc. This algorithm improves verify by caching and 
reusing reachtubes, and can be effective when analyzing a sequence of slightly 
different scenarios. The function verifyInc avoids re-computing postg q and 
posta When constructing the execution tree by reusing earlier execution runs. 
Experiments show that verifyInc reduces running time by 10x for two identi- 
cal runs and 2x when the decision logic is slightly modified. (More details are 
provided in the extended version of paper [26]). This exercise illustrates a usage 
of Verse in creating alternative analysis algorithms. 


Table 1 summarizes the running time of verifying all the examples in this 
section. We additionally include three standard benchmarks: van-der-pol (Agent 
V) [20], spacecraft rendezvous (Agent S) [20], and gearbox (Agent G) [2]. As 
expected, the running times increase with the number of discrete mode transi- 
tion. However, for complicated scenario with 7 agents and 37 transitions, the 
verification can still finish in under 6 mins, which suggests some level of scala- 
bility. The choice of reachability engine can also impact running time. For the 
same scenario in rows 2, 3 and 10, 11, Verse with NeuReach? as the reachability 
engine takes more time than using DryVR as the reachability engine. 


Table 1. Runtime for verifying examples in Sect. 5. Columns are: number of agents (#A), agent 
type (A), map used (Map), reachability engine used (postCont), sensor type (NS), number of mode 
transitions ##TR, and the total run time (Rt). N/A for not available. 


#A\A | Map /postCont |NS |#Tr Rt (s) #A |A |Map |postCont Noisy S | #Tr Rt (s) 
2 D M6 | DryVR No 8 55.9 | 2 D|M5 DryVR | No 5 18.7 
2 D | M5 | NeuReach | No 5 | 1071.23 D|M5 |DryVR_ | No 7 39.6 
7 CM2 DryVR No |37 322.7 |3 C|M1 DryVR No 5 23.4 
3 C M3 DryVR No 4 34.7 |3 C|M4 DryVR No 7 118.3 
3 C| M1 DryVR Yes | 5 29.4 | 2 C|M1 | DryVR No 5 21.6 
2 C | M1 | NeuReach | No 5 914.9} 1 V|N/A|DryVR |N/A 1 0.33 
1 [S N/A DryVR |N/A] 3 23|1 [G N/A DryVR |N/A 3 | 67.14 


6 Related Work 


Automatic hybrid verification tools typically require the input model to be writ- 
ten in a tool-specific language [10,13-15,17,25]. Libraries like JuliaReach [7] 


? Runtime for NeuReach includes training time. 
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Hylaa [5] and HyPro [8] share our motivation to reduce the usability barrier by 
providing reachability analysis APIs for popular programming languages. Verse 
is distinct in this family in that it supports creation and analysis of multi-agent 
scenarios. The work in [33] also supports multiple agents, however, Verse sig- 
nificantly improves usability with maps, scenarios and decision logics written in 
Python. 

Interactive theorem provers have been used for modeling and verification 
of multi-agent and hybrid systems [16,19,27,29]. KeYmeraX [19] uses quantified 
differential dynamic logic for specifying multi-agent scenarios and supports proof 
search and user defined tactics. Isabelle/HOL [16], PVS [27], and Maude [29] have 
also been used for limited classes of hybrid systems. These approaches are geared 
for a different user segment in that they provide higher expressive and analytical 
power to expert users. Verse is inspired by widely used tools for simulating multi- 
agent scenarios [9,18,28,30,36]. While the models created in these tools can be 
flexible and expressive, currently they are not amenable to formal verification. 


7 Conclusions and Future Directions 


In this paper, we presented the new open source Verse library for broadening 
applications of hybrid system verification technologies to scenarios involving mul- 
tiple interacting decision-making agents. There are several future directions for 
Verse. Verse currently assumes all agents interact with each other only through 
the sensor in the scenario and all agents share the same sensor. This restriction 
could be relaxed to have different types of asymmetric sensors. Functions for 
constructing and systematically sampling scenarios could be developed. Func- 
tions for post-computation for white-box models by building connections with 
existing tools [1,10,15] would be a natural next step. Those approaches could 
obviously utilize the symmetry property of agent dynamics as in [32,34], but 
beyond that, new types of symmetry reductions should be possible by exploiting 
the map geometry. 
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Abstract. Given a specification as a Boolean relation between inputs 
and outputs, Boolean functional synthesis generates a function, called a 
Skolem function, for each output in terms of the inputs such that the 
specification is satisfied. In general, there may be many possibilities for 
Skolem functions satisfying the same specification, and criteria to pick 
one or the other may vary from specification to specification. 

In this paper, we develop a technique to represent the space of Skolem 
functions in a criteria-agnostic form that makes it possible to subse- 
quently extract Skolem functions for different criteria. Our focus is on 
identifying such a form and on developing a compilation algorithm for 
this form. Our approach is based on a novel counter-example guided 
strategy for existentially quantifying a subset of variables from a spec- 
ification in negation normal form. We implement this technique and 
compare our performance with those of other knowledge compilation 
approaches for Boolean functional synthesis, and show promising results. 


1 Introduction 


Manually designing systems that satisfy complex user-provided specifications 
can be notoriously tricky. Automated synthesis has therefore attracted signifi- 
cant attention of researchers over the past few decades [1—5]. In this paradigm, 
a user describes the desired behaviour of a system as a relational specification 
between its inputs and outputs, and an algorithm automatically generates an 
implementation, such that the specification is provably satisfied. In this paper, 
we focus only on systems with Boolean inputs and outputs with relational spec- 
ifications given as Boolean formulas. The synthesis problem in this setting is 
also called Boolean functional synthesis. Formally, let y(X,Y) be a Boolean 
formula representing the specification, where X = (2,...@%m) is a vector of 
Boolean inputs and Y = (y1,...Yn) a vector of Boolean outputs of the system. 
Boolean functional synthesis requires us to generate a vector of Boolean func- 
tions W(X) = (Y(X), T -Un (X)) such that VX (JY y(X, Y) p(X, #(X))). 
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For each i € {1,...n}, the function 7;(X) is called a Skolem function for y; in 
(X,Y), and #(X) is called a Skolem function vector. 

There are several interesting applications of Boolean functional synthesis, 
including automated program synthesis, circuit repair and debugging, crypt- 
analysis and the like [2,6-10]. This has motivated researchers to develop novel 
algorithms for solving increasingly larger and more complex synthesis bench- 
marks [11-19]. Each such algorithm generates a single Skolem function vector 
for a given relational specification, thereby providing an implementation of the 
system. However, there may be many alternative function vectors that also serve 
as Skolem function vectors for the same specification. Some of these may yield 
system implementations that are more “desirable” than those obtained from other 
Skolem function vectors, when non-functional metrics like size of program/circuit 
needed for implementation, ease of understandability etc. are considered. There- 
fore, having a tool output a single Skolem function vector (chosen by the tool, 
without any user agency in the choice) can be restrictive in terms of implemen- 
tation choices available to the user. 

One way to address the above problem is to use a knowledge compilation app- 
roach, i.e. to compile the specification to a special normal form from which it is 
relatively easy to use downstream logic synthesis tools to generate any Skolem 
function vector optimizing user-specified criteria. Unfortunately, earlier work on 
knowledge compilation for Boolean functional synthesis [13, 14,20] does not allow 
us to do this easily. They simply allow efficient synthesis of one (among possi- 
bly many) Skolem function vector from the compiled representation. Moreover, 
the user has no agency in choosing which Skolem function vector is synthesized; 
all choices are made implicitly deep inside heuristics of the compilation algo- 
rithms. For example, if we compile a relational specification to wONNF [14] or 
SynNNF [13], the only guarantee we have is that the so-called GACKS Skolem 
functions (see [14]) can be efficiently synthesized from the compiled forms. But 
what if these functions are not the user’s preferred choice of Skolem functions 
for an application? Unfortunately, not much can be done if we compile the spec- 
ification to wONNF or SynNNF. Similarly, the compilation approach proposed 
in [20] allows efficient synthesis of Skolem functions of yet another form, but 
even here, the user hardly has any agency in choosing which (among many alter- 
native) Skolem function vectors is actually output. Existing algorithms therefore 
effectively restrict the semantic choice of Skolem functions with hardly any way 
for the user to influence this choice. Once the semantic choice has been made by 
the compiler, the only agency the user has is in optimizing the implementation of 
this semantic choice. We believe the inability of existing compilation approaches 
to allow the user semantic choice of Skolem functions is a limiting factor in prac- 
tical usage of these works. In this paper, we take a first step towards remedying 
this problem. 

The central question we ask in this paper is: Can we compile a Boolean rela- 
tional specification to a representation that does not restrict the semantic choice 
of Skolem functions, and yet allows easy deployment of downstream logic synthe- 
sis tools to obtain Skolem functions customized to user-provided criteria? Our 
main result is an affirmative answer to this question. We also design and imple- 
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ment an algorithm that compiles a given specification in negation normal form 
to such a representation form, We emphasize that our goal in this paper is not 
to identify specific optimization criteria or to synthesize Skolem functions that 
optimize some specific criteria. Instead, we focus on developing a representation 
that makes it possible to use downstream logic optimization tools to synthesize 
Skolem functions satisfying user-provided criteria. Our experiments show that 
our approach is competitive performance-wise to earlier approaches that severely 
restrict the semantic choice of Skolem functions. 
The primary contributions of this paper can be summarized as follows. 


— We formalize the problem of symbolically and compactly representing all 
Skolem function vectors for a Boolean relational specification in such a way 
that it is amenable to downstream optimization by logic synthesis tools. 

— We propose a candidate for this representation as a set of pairs of functions, 
one for each output, which we call the Skolem basis vector. We show that the 
Skolem basis vector is guaranteed to exist for any specification and is unique 
with respect to an ordering of the output variables. 

— For single-output specifications, we show that the Skolem basis vector can 
be computed easily, as a pair of (semantically unique) Boolean functions. For 
multi-output specifications, we relate the problem of generating Skolem basis 
vector to the question of performing efficient quantification of outputs. 

— We investigate two properties, namely unateness and conflict-freeness of out- 
puts, that permit efficient quantification of outputs. This, in turn, allows a 
Skolem basis vector to be generated in polynomial time in special cases. 

— We present a novel counterexample-guided algorithm for transforming a spec- 
ification to one where a designated output variable is conflict-free. We call 
this process rectification of the output. 

— We present an overall algorithm that takes a specification and generates a 
Skolem basis vector by successively rendering outputs unate or conflict-free. 

— We present a tool implementing our algorithm, and report experimental 
results on a suite of publicly available benchmarks. 


Related Work. In knowledge compilation, the general goal is to represent a prob- 
lem specification in a form that allows specific questions to be answered effi- 
ciently (see e.g., [21-23]). In [22,24], representation forms for Boolean functions 
were proposed that allow efficient enumeration of all satisfying assignments of 
the function. However, this idea cannot be easily extended to enumerate Skolem 
functions, since the space of functions is doubly exponentially large in the num- 
ber of variables. For Boolean functional synthesis, [13,20,25,26] provide normal 
forms and present compilers that render synthesis of a single Skolem function 
vector easy. However, they do not provide the user any agency in choosing the 
Skolem function vector. In fact, the optimizations used in [13] preclude gen- 
eration of all Skolem function vectors for reasons of efficiency. In the current 
work, our focus is on symbolically representing the space of all Skolem function 
vectors, without necessarily converting the given specification to a semantically 
equivalent one in special normal form. Thus, the problem addressed in this paper 
is technically different from those addressed in [13, 20, 25,26]. Nevertheless, our 
work can be viewed as knowledge representation for all Skolem functions. 
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2 A Motivating Example 


We start with a simple example that illustrates some of the problems we wish 
to address. Suppose we are designing a memoryless arbiter that must arbitrate 
requests from three users for a shared resource. Let the arbiter inputs be Boolean 
variables r1,r2,r3, where r; is true iff there is a request from user i. Let the 
corresponding arbiter outputs be gi, 92,93, where g; is true iff access is granted 
to user i. We want the arbiter to satisfy the following properties: (a) at most 
one user must be granted access at a time, (b) if some user has requested access, 
some user must be granted access, and (c) a user should be granted access only 
if she has requested. The above properties can be encoded as a specification y = 
yiNgoAgs, where p1 = (g1 > 7(92Vg3)) A(g2 > 7(91V 93)) A (93 > 7(g1V 92); 
pe = (r1 V r2 V r3) > (g1 V g2 V g3), and p3 = (g1 > 71) A (g2 > r2) A (g3 > r3). 
It turns out that there are many different Skolem function vectors W = 
(Y1, Y2, %3) for the above specification, where each Y; gives a Skolem function 
for gi. We ran two state-of-the-art Boolean functional synthesis tools, viz. Man- 
than2 [17] and BFSS [14], on this specification. BFSS required us to also specify 
a linear order of outputs (we will shortly see why), and we used gi < g2 < gs. 
Both tools solved the problem in no time, and each reported a Skolem function 
vector without any room for the user to influence the choice of Skolem functions. 
Specifically, the Skolem functions returned by Manthan2 can be represented by 
the And-Inverter Graph (AIG) shown in Fig. 1a. Here, each circle represents 
a two-input AND gate, and each dotted (resp. solid) edge represents a con- 
nection with (resp. without) logical negation. Thus, the Skolem functions are: 
2 = r2 Aari Arg, Yi = ri A arg Ang and Y3 = r3 A 791 A ag2. Running 
BFSS on the same specification yields Skolem functions represented by the AIG 
in Fig. 1c. Here, 3 = r3 A ari A ara, Y2 = r2 A 7g3 and Y1 = r1 A 7g2 A 793. 


ila ie, ae cle ai. 3S 


Fig. 1. Unoptimized and optimized AIGs of Skolem functions 


Are the Skolem functions generated by the two tools in their simplest forms, 
and did they miss out some possibilities of optimization? To answer this, we used 
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a widely used logic optimization tool, viz. abc [27], to simplify the two AIGs 
using commands to minimize the AND gate count and to balance lengths of 
paths in the AIGs. The resulting simplified AIGs are shown in Fig. 1b (obtained 
from Fig. la) and Fig. 1d (obtained from Fig. 1c). Thus, Manthan2’s solution is 
equivalent to Y3 = r3, Y2 =TreA77, Anrs, Y1 = rı \7rs, while BFSS’ solution 
is equivalent to Y2 = r2, Yı = rı A712, W3 = r3 A arı A 71g. Note that the two 
solutions are semantically equivalent modulo permutaton of indices (although 
this wasn’t obvious prior to optimization). 

There are some important take-aways from this simple experiment. First, 
neither Manthan2 nor BFSS gave the user any agency in the semantic choice 
of the synthesized Skolem functions. The use of the abc tool with user-provided 
optimization criteria at the end simply gave us choice of implementation for 
the Skolem functions already determined by each tool. Significantly, there are 
choices of Skolem function vectors, viz. 1 = r1A(7reV-71rs), Y2 = r2A(=r1 Vr3), 
pa = (arı Anr2 A r3), that are ignored by both Manthan2 and BFSS (and by 
other tools like CADET [11]). This can lead to ignoring “better” Skolem func- 
tion vectors in general. The user’s criteria for desirability of Skolem functions 
may differ from one problem instance to another, and may be completely dif- 
ferent from what is hard-coded in the innards of a tool like Manthan2/BFSS. 
For example, the new Skolem function vector considered above admits an AIG 
representation in which input-to-output shortest (resp. longest) path lengths are 
equal across all outputs. This may indeed be a desirable feature in some appli- 
cation where variability of output delays matters. However, there is currently no 
way to influence BFSS/Manthan2 to arrive at Skolem functions optimized per 
such criteria. 

The above example also illustrates the important role played by logic opti- 
mization in obtaining efficient implementations of Skolem functions generated 
by state-of-the-art synthesis tools. However, using logic optimization as a post- 
processor can only provide a better implementation of already chosen (seman- 
tically) Skolem functions. Fortunately, more than five decades of research in 
logic optimization has resulted in mature (even commercial) tools that can do 
much more than just implementation optimization. Specifically, don’t-care based 
optimizations [28] can search within a specified space of (semantically distinct) 
functions to choose one that is optimized according to a given user criteria. Such 
a choice involves a combined optimization across semantic and implementation 
choices. Given this capability of logic optimizers, and their indispensable use in 
synthesis flows, we posit that logic optimizers are the right engines to choose 
between alternative semantic choices of Skolem functions, in addition to opti- 
mizing their implementation. Of course, this requires specifying the semantic 
space of all (Skolem) functions in a form that can be easily processed by logic 
optimizers. State-of-the-art logic optimizers already allow specifying a family of 
functions using on-sets and don’t-care sets [29]. Therefore, we propose to use 
this representation for representing the space of Skolem functions as well. 

Before presenting the details of on-sets and don’t-care sets for Skolem func- 
tions in our example, we note that Skolem functions for different outputs cannot 
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be chosen independently in general. For example, Y3 = r3 is generated by Man- 
than2, and w2 = r2 is generated by BFSS. However, there is no Skolem function 
vector with Y2 = rg and w3 = r3), since this would lead to g2 = g3 = 1 when 
rg = r3 = 1. Therefore, any representation of the semantic space of all Skolem 
function vectors must necessarily take into account dependence between Skolem 
functions for different outputs. One way to achieve this is to impose a linear 
order on the outputs, and to represent the set of Skolem functions for an output 
in terms of Skolem functions for preceding (in the order) outputs. With this app- 
roach, the semantic space of Skolem functions for each output can be expressed 
by two functions: one representing the set of assignments for which every Skolem 
function in the represented space must evaluate to 1 (i.e. on-set), and the other 
representing assignments for which it is ok for a Skolem function to evaluate to 
either 0 or 1 (i.e. don’t-care set). 

The above representation is analogous to representing vector spaces using 
a small set of mutually orthogonal basis vectors, where every vector in the 
space can be expressed as a linear combination of these basis vectors. In a 
similar manner, let A denote the on-set of a family of Skolem functions, and 
B denote the don’t-care set for the same family. Let GenImpl(B) denote the 
set of all generalized implicants of B, i.e. all formulas v such that v => B. 
Every Skolem function in the represented space can then be obtained (mod- 
ulo semantic equivalence) as A V v where v € Genlmpl (B). Specifically, for our 
example, with gı < g2 < g3 of outpus (same as that given to BFSS), we have 
Aj = (ar3AnreArz), Bı = (r3Vra)Ari, Apo = (ar3Are2A791), Bə = r3 AT2A 591, 
A3 = r3 A 7g2 Angi, B3 = 0. The Karnaugh-maps shown below depict how the 
space of all Skolem function vectors can be visualized in terms of A; and Bj. 
To obtain a specific Skolem function vector, we must place a 1 in each A;-cell, 
choose a subset of the B; cells and place 1’s in those cells and 0’s in the bal- 
ance B; cells. Each such choice provides a semantically distinct Skolem function 
vector, and every Skolem function vector corresponds to one such choice. Specif- 
ically, the Skolem function vector missed by Manthan2/BFSS can now be easily 
obtained by choosing the red and blue B; cells and the teal Bə cell to be 1 in 
the Karnaugh-maps. Similarly, Manthan2’s solution is obtained by choosing the 
blue B; cell and teal Bə cell to be 1, and BFSS’ solution is obtained by choosing 
the red Bı cell and teal Bə cell to be 1. Allowing a logic optimizer to optimize 
Skolem functions with the spaces represented by (A1, B1, A2, B2, A3, B3) there- 
fore makes it possible to synthesize each of these Skolem function vectors. This 
motivates compiling a given specification into an (A;, Bi) pair for the Skolem 
functions for each output yi. 


rer3 —|00/01|11|10 rar3 —|00/01) 11} 10 g2r3 —|00)01}11} 10 
ri gl gl 

0 0/10/1010 (0) 0|0/B2| A2 0 0|A3/0}] 0 

1 Aı Bı|Bı| Bı 1 01010] 0 1 0101/0] 0 


Space of Sk fns for gi Space of Sk fns for g2 Space of Sk fns for g3 
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3 Preliminaries and Notation 


Let Z = (21,...,2n) be a vector of Boolean variables. A literal is a variable 
(zi) or its complement (72;), a clause is a disjunction of literals and a cube is a 
conjunction of literals. For 1 < i < j < n, we use Z? to denote the slice (zi, ... zj) 
of the vector Z. An n-input Boolean function is a mapping from {0,1}”" to 
{0,1}. A Boolean formula y(Z) is a syntactic object whose semantics is given 
by a mapping from {0,1}” to {0,1}. Thus, every Boolean formula represents 
a unique Boolean function, and every Boolean function can be represented by 
a (not necessarily unique) Boolean formula. Henceforth, we refer to Boolean 
formulas and Boolean functions interchangeably. 

The support of y(Z), denoted sup(y), is the set of variables in Z. For ease 
of exposition, we will abuse notation and use Z to denote either a vector or the 
underlying set of elements, depending on the context. A complete (resp. partial) 
assignment m for Z is a complete (resp. partial) mapping from Z to {0,1}. The 
value of variable z; assigned by m is denoted z[z;]. A complete assignment 7 of 
Z is a satisfying assignment for y(Z) if the Boolean function represented by y 
evaluates to 1 when all variables in sup(y) are assigned values given by m. In 
this case, we say that 7 = F. A formula y(Z) is satisfiable if it has at least one 
satisfying assignment; otherwise it is unsatisfiable. We say that two formulas on n 
variables are equivalent if they represent the same semantic mapping from {0, 1}” 
to {0,1}. Given Boolean formulas y and a with z; E€ sup(y), we use y[z; > al 
to denote the formula obtained by substituting @ for every occurrence of z; in 
p. We use e| (resp. ¢| ,,-0) to denote the formula obtained by setting z; to 


1 (resp. 0) in the formula y(Z). The resulting formulas are also called positive 
(resp. negative) co-factors of p w.r.t. z;. For notational convenience, we use el, 
to denote the formula obtained by repeatedly co-factoring y using the (possibly 
partial) assignment of variables given by 7. As discussed in Sect.2, we say that 
a function y/(Z) is a generalized implicant of (Z) if p'(Z) = (Z). This 
generalizes the notion of implicants used in the literature, which are restricted 
to be cubes. The set of all generalized implication of y is denoted GenImpl(¢). 

A Boolean formula y(Z) can be represented as a circuit or a Directed Acyclic 
Graph (DAG) consisting of =, ^A and V gates, with literals at leaves. Further, it 
can be converted to a semantically equivalent formula in Negation Normal Form 
(NNF), i.e., with no —-labelled internal nodes, in time linear in the size of the 
circuit. We consider formulas to be given in NNF unless mentioned otherwise, 
and interchangeably refer to a Boolean formula and the circuit representing it. 
If an NNF formula in Conjunctive Normal Form (CNF), i.e., as conjunction of 
clauses, is unsatisfiable, then there is a subset of its clauses whose conjunction is 
unsatisfiable. This set is called its unsatisfiable core, and a minimal unsatisfiable 
core is one without any proper subset that is also an unsatisfiable core. 

The Boolean functional synthesis problem, and notions of Skolem functions 
and Skolem function vectors have already been defined in Sect. 1. Let p(X, Y) 
be a Boolean relational specification over inputs X and outputs Y. A commonly 
used approach, adopted by several Boolean functional synthesis algorithms [6, 
14-16], works as follows. Without loss of generality, let y1 < --- < Yn bea 


374 S. Akshay et al. 


linear ordering of the outputs in Y. We first define a set of derived specifications 
yp (X,Y) for all i € {1,...n}, where p © JY! y(X,Y). Next, for each 
i € {1,...n}, we find a Skolem function for y; from the derived specification 
yp (X,Y), by treating y; as the sole output and all of X,Y?,, as inputs in 
yp. Let pi( X,Y? 1) denote the Skolem function for y; thus obtained. Finally, 
we substitute the Skolem functions wj41,...Wn for yi+1,---Yn respectively in 
the Skolem function Y; obtained above. This gives a Skolem function for y; only 
in terms of X. By repeating the above process for all i in decreasing order from 
n — 1 to 1, we obtain a Skolem function vector for y. 


4 A New Knowledge Representation for Skolem 
Functions 


We start with a key definition that is motivated by the desire to represent the 
entire space of Skolem functions arising from a specification compactly, and in 
a form that is easily amenable to well-established logic synthesis and optimiza- 
tion workflows. Recall from Sect.2 that for a multi-output specification, Skolem 
functions for different outputs may be dependent on each other. Hence, the set 
of Skolem function vectors cannot be expressed as a Cartesian product of sets 
of Skolem functions for individual outputs. Instead, we impose a linear order 
on the outputs, and express the Skolem function for one output in terms of the 
inputs and other outputs that precede it in the order. Such a linear order may be 
automatically generated, user-provided, or even generated with guidance from 
the user, e.g., if the user provides a partial order on the outputs. We assume the 
availability of such an order < in the definition below. 


Definition 1. Let (X,Y) be a specification over a linearly ordered set of out- 
puts Y = {y1,.--,Yn}. We say that output y; has a Skolem basis in ọ if there 
exists a pair of functions (A;, Bi) over X UY", such that 


1. A; ^ Bi is unsatisfiable, and 
2. any Skolem function W(X, Yj") for yi in the derived specification y™ can 
be written as pi = A; V g for some g € GenImpl(B;). 


We call the vector of pairs ((A;, Bi))1<i<n the Skolem basis vector for p wrt <. 


The Skolem basis vector can be seen as a succinct representation of the 
Skolem function space, i.e., the set of all Skolem function vectors of y. A natu- 
ral question that arises at this point is: Given a specification p and order < of 
outputs, does there always exist a Skolem basis for p wrt <? Fortunately, as we 
show in this paper, the answer is a resounding “Yes”. Not only that, the Skolem 
basis for a given y and ~ is unique upto semantic equivalence of the basis func- 
tions. It is important to note that not every set of functions can be represented 
using just two basis functions. This is easy to see via a counting argument: the 


number of sets of Boolean functions over m inputs is gen However, the number 
of sets that admit a Skolem basis is (loosely) upper bounded by 272”. Skolem 
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functions are therefore special, since we show that the space of all Skolem func- 
tions for every output in every specification always admits representation by two 
basis functions, regardless of the order <. Interestingly, though the definition of 
Skolem basis vector needs us to specify an order < on the outputs, somewhat 
surprisingly, the Skolem function space itself does not depend on the order. 


Proposition 1. Suppose © is a Skolem function vector for the outputs Y in 
terms of inputs X in p. Then, for any order <, © can be generated using the 
Skolem basis vector of p wrt <, and then substituting, for each i € {1,...n}, the 
Skolem functions pj for yj where i <j < n, in the Skolem function for qi. 


Proof Sketch: With ordering yı < y2 < ...Yn, let ((A;, Bi)) be the corresponding 
Skolem basis vector. The support of An, Bn are only the inputs X, while the 
support of A;, B; (for i > 1) are X U {yj41,---Yn}. Let P = (Y1, ... Yn) be an 
arbitrary Skolem function vector, where each 7; is a function of X. By definition 
of Skolem basis, since Yn is a Skolem function for yn, it can be obtained from 
An and Bn (each of which has support X). Now consider y; for 1 < i < n. 
By definition of Skolem basis, every Skolem function for y; in terms of X U 
{Yi+1; Yn} can be obtained from A; and B;. In particular, if we set yji1 t 
wi41 and so on until yn to Yn, every Skolem function for y; in terms of X can 
be obtained from A; and B;. 

Another interesting property about Skolem basis vector is that, when it 
exists, it is unique. Later we will show (constructively) that it always exists 
and hence we would have also constructed the unique one. 


fo) 


Proposition 2. For any y; in y, its Skolem basis, when it exists, is unique. 


Proof. Fix i. Let S be the set of all Skolem functions for y; in gp. From Defini- 
tion 1, we know that for all f € S, A; > f. Hence, A; > A\fes f. However, we 
also know that A; € S (corresponds to choosing the generalized implicant 0 from 
GenImpl(B;) ). Therefore, (Afes f) => A;. It follows from the two implications 
that AS Apes f- 

In a similar manner, Definition 1 implies that for all f € S, f > A; V Bi. 
Hence (Vjes f) => A; V Bi. However, we know that A; V B; € S (corresponds to 
choosing the generalized implicant B from Genlmpl(B)). Therefore, A; V B; > 
V res f- It follows from the two implications that B; = V fes Í- 


Finally, we explain how our new representation of Skolem functions using 
a Skolem basis vector naturally lends itself to easy processing by downstream 
logic synthesis and optimization tools. Thus, a Skolem basis vector is not just an 
arbitrary way to represent the space of all Skolem function vectors; instead, it 
is strongly motivated by the way modern logic synthesis and optimization tools 
work to search the semantic space of partially specified functions (i.e. functions 
specified with on-sets and don’t-care sets). Specifically, in logic synthesis and 
optimization parlance [29], A; is the on-set and B; is the don’t-care set for Skolem 
functions for y; in y. In other words, A; describes all assignments for which every 
Skolem function for y; must evaluate to 1 while B; describes those assignments 
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on which a Skolem function can evaluate to either 1 or 0 without violating 
the requirement of being a Skolem function for y; in y. Thus, every semantically 
distinct Skolem function for y; in y can be obtained by choosing a distinct subset 
of satisfying assignments of B; and choosing the Skolem function to evaluate on 
this subset of assignments in addition to those determined by A;. Indeed, state- 
of-the-art logic synthesis and optimization tools (such as abc [27]) use on-sets 
and don’t care sets expressed as Boolean functions to represent the space of all 
realizations of a partially specified function. The don’t cares are then used to 
optimize the semantic and implementation choices when choosing the optimal 
realization of such a partially specified function, as per user provided criteria 
like area, gate count, delay, power consumption, balance of delays across paths 
etc. Indeed, the following guarantee follows rather trivially from Proposition 1. 


Proposition 3. Suppose we have access to a logic optimization tool that finds 
the optimal semantic and implementation choice of a partially specified function 
as per user criteria. Using this tool on the Skolem basis vector of p wrt < yields 
the optimal choice among all Skolem functions, where optimality of Skolem func- 
tion for yi is conditioned on the choice of Skolem functions for yj, for 1 <j <i. 


Having defined and motivated the Skolem basis vector as our new knowledge 
representation, in the rest of the paper we will show how it can actually be 
computed, in theory and in practice. 


5 Towards Synthesizing the Skolem Basis Vector 


The Single Output Case: First, we consider the case of a singleton output 
and show that here the existence of Skolem basis is easy to establish, and the 
basis is also easy to compute. 


Theorem 1. For a single-output specification p(X,y), the Skolem basis for y 
in y is given by A= p(X, 1) A7y(X,0) and B= o(X,1) > y(X,0). Thus, in 
this case, the Skolem basis vector for p can be computed in time/space linear in 
size of the circuit representing y. 


Proof. Let 2'*! denote the set of all complete assignments 7 of X. Define Sı = 
{n | x € 21X1, n H y(X,1)} and So = {r | m € 21X1, r H} y(X,0)}. By 
definition of So and S1, (with S; denoting complement of set S;), we have: 


—~ T E S1 U So iff r E dy y(X, y). 

—~ T E€ S1 N So iff r E Vy y(X, y). 

- r € & N So iff t K Yy p(X, y). 

— For every 7 € S1 N So, the only value of y that makes p 
— For every 7 € So N Sı, the only value of y that makes p 


) true is 1. 


(m,y 
(7, y) true is 0. 


i 


Now let %(X) be an arbitrary Skolem function for y in y(X). Recall that by 
definition a Skolem function satisfies VX (Sy y(X,y) = y(X,~(X))). It then 


follows from the above observations that if m € S1 N So, Y(T) must evaluate to 
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1. Similarly, if m € (S1 N So) U (S1 N So), it makes no difference whether y(r) 
evaluates to 0 or 1. Finally, if  € So N S1, Y(T) must evaluate to 0. Since w 
was an arbitrary Skolem function for y in y, we infer that the Skolem basis for 
AllSk(y) is (A, B), where A = y(X,1) A =y(X,0) represents the set S1 N So, 
and B = (y(X,0) = y(X,1)) represents the set (S1 N So) U (SIN So). 


We next consider the multiple output case, where our strategy (as done usually 
for Skolem function synthesis) is to reduce to the one-output case above. 


Multiple Outputs and Existential Quantification: When we have multi- 
ple outputs, from the definition of Skolem basis vector (Definition 1), it fol- 
lows that the problem reduces to the single output case, if we can compute the 
derived specifications p (X, Y;".,). Unfortunately, computing p (X, Y?) can- 
not always be done efficiently, even when y(X,Y) and the order < on Y are 
given. We compute y™ from a given y—!), where the variable y; to be quantified 
is either chosen on-the-fly (giving a dynamic computation of <) or determined 
as per a statically provided order. Since py“) & JYiy & Ayw™ for all 
i € {1,...n—1}, we first consider how a single output variable can be quantified 
from a derived specification. 

The conceptually simplest way to compute Jy; ep is as p Pai Vp 


yi=0" 
Unfortunately, this doubles the size of the circuit representation. An alternative 


is to find a Skolem function, say ¥;, for y; in y®, and then use y® fy; 1 
pi]. This works well when Y; can be represented compactly. However, an NNF 
representation of Y; can be as large as that of yp (e.g. if y; = g% | ais in 
which case we may double the circuit size. We therefore ask if it is possible 
to compute Sy, p® by simply substituting a constant (not necessarily a Skolem 
function) for yi in an NNF formula of almost the same size as yp“. It turns out 
that this is possible in two practically relevant cases. In other cases, we transform 
the circuit to permit such constant substitutions. For notational convenience, in 
the rest of this section, we omit i and use y and y for y; and y™. 


The Case of Unates: A variable y is positive (resp. negative) unate in ọ if 
Ply- = | jt (resp. Ply = ¥| <0) A variable is unate in ¢ if it is either 
positive or negative unate in y. Then, we have: easily proved. 


Lemma 1. If y is positive unate in vy, then Jy y > Ol yas Similarly, if y is 


negative unate in p, then Jy p = Ply 


Proof. The proof immediately from the definition of positive and negative unate- 
ness, and from the fact that Jy y = Ply V lyar 


As an example, consider y = (£A (y1 Vy2))V(~xr^ ye). Here, yı is positive unate 
in y, but y2 is not unate in y. However, y2 is negative unate in %| ea which by 


Lemma 1 is equivalent to dy; y. This shows that even if a variable is not unate 
to begin with, it may become unate after some variables are quantified. If we use 
the order yı < y2 in our example, both Jy; y and Jy1Jy2 Y can be computed by 
substituting for yı and y2 in y. This is however not true for y2 < yı. 
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xy Y -y <2 Tı y Y T2 


Fig. 2. NNF circuit representations of formula y1, yf, 2, Y3- 


In general, given a specification y( X,Y) and a linear ordering < of outputs, 
if each output y; is unate in the derived specification yp = ay. p, then we 
can apply Lemma 1, Definition 1 and Theorem 1 to synthesize the entire Skolem 
basis vector for y w.r.t. < efficiently. This also suggests a heuristic for finding a 
(partial) order on the outputs Y. Specifically, given a derived specification y™, 
we try to find an output variable y in its support such that y is unate in yp. If 
such a variable exists, we use it as the next variable in the < order, and obtain 
y“t+) by using Lemma 1 to compute Jy y®. As our experiments show (see 
Sect. 7) and has also been observed elsewhere [14], this approach is surprisingly 
effective for finding Skolem functions for many benchmarks. 


The Case of No Conflicts: Next, we consider another case where quantifi- 
cation can be achieved by substituing constants for variables. 


Definition 2. Let p be an NNF formula, y € sup(y). Suppose we replace every 
occurence of ~y in yp by a fresh variable y (J Z sup(y)). The resulting formula 
is called the y-positive form of y and is denoted yt¥. The variable y is said to 
be in conflict in y if there exists an assignment m : sup(y) \ {y} —> {0,1} such 
that pty &yA¥. Otherwise, we say that y is conflict-free in y+”. 


The assignment 7 in the above definition is called a counterexample to conflict- 
freeness of y in ọ. It is easy to see that both y and Ẹ are positive unate in yt’. 
Henceforth, we use y* instead of yt¥ when y is clear from the context. 

We illustrate conflicts and conflict-freeness in Fig. 2. The y-positive form of 
pı is shown as y], where Ẹĵ is a fresh variable. Clearly, y is in conflict in y1 
since pil, SyAYy for 7:21 |> 0, z2 + 0. Similarly, y is in conflict in ye (as 
seen with 7 : xı +> 0,22 + 0). However, y is not in conflict in ys as there is no 
assignment a of x1, £2 for which 03 |. Syy. 


Lemma 2. Ify is conflict-free in p, then Jy ọ = ptl y= gr 


Proof. Since y is conflict-free in y, it follows that PP i Hi? Ca Bo V 


t| oz Since all internal nodes in yt are labeled by either A or V, it 


also follows that y and Ẹ are positive unate in yt. Therefore, (ot | Vv 
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The proof is completed by observing that by defini- 


evga = BT | igi 
= Vg" 


tion Jy y © (el -0 V A) = (p| 


y=0,J=1 gargo) 


A notion similar to conflict as defined above was used in [13,20] for defining nor- 

mal forms for synthesis. The difference is that unlike in [13,20], we do not require 

a pre-specified subset of the support to be set to 1 in the assignment 7. To iden- 
. . . + + 

tify conflicts, we define a conflict formula Kọ, y as (py eee An7p PEE, A 

797 | <0 gai)" By Definition 2, y is conflict-free in y iff kọ, is unsatisfiable. 


Proposition 4. For 1 < i < 4, there exist p; with y; E€ sup(y;) s.t., (i) yı is 
neither unate nor conflict-free in pı, (ii) y2 is unate but not conflict-free in yo, 
(iii) y3 is conflict-free but not unate in p3, (iv) ys is unate, conflict-free in pa. 


The formulas y1, p2, Y3 from Fig. 2 satisfy conditions (i), (ii) and (iii) respec- 
tively. For (iv), we consider ys = x A y, in which y is unate and conflict-free. 
Lemmas 1, 2 and Proposition 4 show that both unateness and conflict-freeness 
are independently useful, and hence combining we directly obtain: 


Theorem 2. Given p(X, Y) and a linear order < on Y, if y; is either unate 
or conflict-free in p for alli € {1,...n}, then we can effectively synthesize the 
Skolem basis vector in time linear in size of ọ. 


We remark that the implications of Theorem 2 go beyond what can be achieved 
by earlier work on normal forms for synthesis [13,20]. Indeed, there are formulas 
that are neither in SynNNF nor SAUNF but for which Theorem 2 applies. 

Finally, unateness is a semantic property; hence if y is not unate in y, it is 
not unate in every u such that p = u. However, conflict-freeness has a represen- 
tational aspect. If y is in conflict in y, we can always find another NNF formula 
u such that (i) u = y, and (ii) y is conflict-free in u. To see why, note that 
if u = (yA Ply) V (ny A P|y=0) i.e. Shannon expansion of y w.r.t. y, then 
L & vy and y is conflict-free in u. However, taking the Shannon expansion may 
not always be the best way to render an output conflict-free, as it often leads 
to blow-up in the size of the expanded formula. In the next section, we give a 
counterexample guided algorithm to obtain u from y and y, that works much 
more efficiently than Shannon expansion in practice. 


6 Counterexample-Guided Rectification 


Recall from the previous section that if y is in conflict in y(X,Y), then there 
exists a counterexample (assignment) m : X UY \ {y} — {0,1} such that 
yt rey y. In this section, we discuss how we can use such counterexamples 
to transform y(X,Y) to a specification u( X,Y) such that u & y and y is 
conflict-free in u. We call such a transformation rectification of p w.r.t y, and 
the resulting formula u is said to be rectified w.r.t. y. 


Lemma 3. Let 7 be a counterexample to conflict-freeness of y in p(X,Y) and 
let € be a formula satisfying (a) sup(€) C XUY \ {y}, (b) p => £, and (c) 
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él, is unsatisfiable. Define T = p A € and let T™ denotes the positive form of T 
w.r.t. y. Then the following hold: (i) T = ọ, (ii) m is not a counterexample to 
conflict-freeness of y in T, and (iii) every counterexample to conflict-freeness of 
y in T is also a counterexample to conflict-freeness of y in y. 


Algorithm 1 RECTIFYONEOUTPUT(y(X,Y), y) 
1: u= p 

2: repeat 

3 res := SATSOLVE(Ky,y) 

4 if res is SAT then 

5 Let 7: X UY \ {y} > {0,1} be a satisfying assignment of Ky,y 
6: € := PARTIALRECTIFIER(u, 77) 
7 

8: 

9: 


: i= AE 
until res is UNSAT > All counterexample to conflict-freeness of y in u are removed 
return u 


Proof. Since y => €, it follows that rT & y ^E & y. This proves claim (i) of 
Lemma 3. Next, note that since 7 is a counterexample to conflict-freeness of y 
in y, we must have ptl < (y AJ). Since € does not have y in its support, 
it follows that rt <= yt A E. Therefore, i s ptl, A El S (y AD) A ra 
However, from the premise of Lemma 3, we know that € |. is unsatisfiable. Hence 
rt| is false. Specifically, 7*|_ 4 (y Aj), and hence 7 is not a counterexample 
to conflict-freeness of y in 7. This proves claim (ii) of Lemma 3. Finally, let 
nm’: XUY \ {y} — {0,1} be a counterexample to conflict-freeness of y in 7. By 
definition, T™| _, © (y ^9). However, r|, = pt], A€]_,. Since all variables in 


n’ 
support of € are assigned by 7’, we must have € „ being equivalent to either 0 
or 1. If |, is 0, then rt 


Therefore, we must have € 
+ 


F wt © (YAY) for 
T*|_, to be equivalent to (yy). It follows that 7’ must be a counterexample to 
conflict-freeness of y in y. This proves claim (iii) of Lemma 3. 


must also be 0, a contradiction of Tt 


am! 


equivalent to 1, and hence yt 


Henceforth, we call a formula € satisfying conditions (a), (b) and (c) of 
Lemma 3 a partial rectifier of p w.r.t. y. Given 7, it is easy to find a partial 
rectifier. 


Lemma 4. For allv € X UY \ {y}, let lon denote v if x[v] = 1, and ~w if 
alu] = 0. Let Er be -(Avexuy \fy} ly). Then €, satisfies conditions (a), (b) 
and (c) of Lemma 3. 


The proof follows immediately from the observations: (i) 7 is the only satisfying 
assignment of 7€,, and (ii) el, > (ptp = ay]) | < (y aD = =y] = 0. 
Consequently, 7€, = —y. Although Lemma 4 gives a partial rectifier, it prevents 
only the assignment 7 from being a counterexample to conflict-freeness of y in 
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T. Later we will see a partial rectifier that prevents many more assignments 
from being counterexamples. For the time being, however, we assume that we 
have access to a procedure PARTIALRECTIFIER that takes as inputs y and 7 and 
outputs a partial rectifier that satisfies conditions (a), (b) and (c) of Lemma 3. 

The above discussion suggests a simple algorithm, shown as Algorithm REC- 
TIFYONEOUTPUT below, for rectifying a specification y w.r.t. an output y. 

The algorithm first initializes a temporary formula u to ọ. It then invokes 
a propositional satisfiability (SAT) solver to obtain a satisfying assignment 7 
of the conflict formula K, (defined in Sect.5 just before Proposition 4). The 
assignment 7 serves as a counterexample to conflict-freeness of y in u, and is 
used to obtain a partial rectifier € of u w.r.t. y. The formula p is then updated 
by conjoining it with €. Lemma 3 guarantees that this gives a specification 
semantically equivalent to y, while removing 7 from the set of counterexamples 
to conflict-freeness of y in u. By repeating the process with the updated formula 
i, all counterexamples to conflict-freeness of y in u are eventually removed. 


Theorem 3. Algorithm RECTIFYONEOUTPUT always terminates with a for- 
mula u s.t. u & p and y is conflict-free in u. 


Proof. The following inductive invariants hold at end of every iteration of the loop 
in lines 2-8, thanks to Lemma 3: (i) y & y, (ii) the set of counterexamples to 
conflict-freeness of y in u has strictly fewer elements than at the start of the itera- 
tion. Since the set of counterexamples is finite (at most 2!*!+!¥!~1 elements), even- 
tually this set must become empty. By definition of the conflict formula, «,,,., must 
be unsatisfiable when this happens. Hence, the algorithm eventually exits the loop 
in lines 2-8 and terminates. Since there are no counterexamples to conflict-freeness 
of y in u on termination, y is indeed conflict-free in p. 


Rectification by Counterexample Generalization: The idea of counterex- 
ample generalization is best illustrated by an example. Consider the specification 
p(X, y) = ((a1 A 22) V ((£2 A x3) V y)) A (“y V (=z3 A x4)), wherein y is in 
conflict. To see why this is so, consider y*Y (henceforth called y+) represented 
as a NNF circuit in Fig.3. Let m be an assignment that assigns 1 to 71,73 and 
0 to x2,24. The values in red below the leaves in Fig.3 represent this assign- 
ment. If we propagate these values upstream to the root of the circuit, we get 
the values/formulas shown in red adjacent to internal nodes, as shown in Fig. 3. 
This process is akin to constant/symbol propagation in symbolic simulation [30]. 
Note that the root of the circuit is assigned y A y by this process, indicating 
that yt |. = (y A9). Hence, y is in conflict in y and 7 is a counterexample to 
conflict-freeness of y in y. 
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Interestingly, the constant/symbol propa- 
gation discussed above can yield many more 
counterexamples beyond 7. Specifically, let N 
denote the set of coloured nodes in the figure. 
Suppose we cut the circuit at the nodes in 
N, as shown by the dotted line in Fig. 3. Let 
the sub-circuit above the cut be denoted Cy. 
Notice that the leaf nodes of Cy are either 
nodes in N or leaf nodes of the original cir- 
cuit corresponding to y or J. Now consider any 
assignment T’ : {x£1, £2, £3, £4} — {0,1} s.t. 
when we propagate constants /symbols in the 
original circuit starting with 7’ at the leaves, 
we get the same values as in Fig. 3 at all nodes 
in N. This ensures that all leaves of Cy have the same constant/symbol as in 
Fig. 3. Therefore, further constant/symbol propagation must assign exactly the 
same constant /symbol/formula at every internal node of Cy as in Fig. 3. Specif- 
ically, the root node is assigned y A y, implying that 7’ is a counterexample to 
conflict-freeness of y in y. 

Can we characterize all the counterexamples 7’ obtainable by the above 
method? It turns out we can do this. First, note from Fig. 3 that the sub-circuits 
rooted at the orange, purple and green nodes represent the Boolean formulas 
z1 A £2, 2 A £3 and (723 A x4) respectively. Hence, the set of all counterex- 
amples 7’ obtained above are precisely the satisfying assignment of the formula 
B= 7(a@1A22)A7(@2A23)A\7(423/A 24). Notice that there are many assignments 
beyond 7 that satisfy 8, e.g. x1x2x%3x%4 = 0000 or 0010 or 1000, and so on. Thus, 
we have truly generalized the counterexample 7. 

In general, given a specification y(X,Y), an output variable y and a coun- 
terexample m : X UY \ {y} — {0,1} to conflict-freeness of y in y, we first 
construct an NNF circuit representing yt. For every node n in the circuit, let 
vy; denote the sub-formula represented by the sub-circuit rooted at n. Next, 
we assign values given by m to the leaves of the circuit representing ~* and 
propagate these values to the root of the circuit. Let v,,, denote the con- 
stant /symbol/formula assigned to node n in the circuit by this process. In 
other words, Unn ®© gr |. We now choose a subset N of nodes n such that 
(i) supl) N{y, 9} = 9, (ii) Un,x is a constant, and (iii) every path from a non- 
y, non-y leaf to the root passes through a node in N. Such a set N can always be 
found, for example, by choosing N to be the set of non-y, non-y leaves. However, 
as Fig. 3 shows, N need not include only leaf nodes. Let x,y denote the formula 
Anew (PÈ © nn) 

Lemma 5. Every satisfying assignment of 3,,n is a counterexample to conflict- 
freeness of y in p. Moreover, ~r y satisfies the three conditions required for a 
partial rectifier as specified in Lemma 3. 


Fig. 3. Circuit representing yt 


Proof. Since every path from a non-y, non-y leaf to the root passes through a 
node in N, we can use nodes in N and the leaves corresponding to y and Ẹ to cut 
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the circuit (as shown in Fig. 3). Let Cy denote the sub-circuit above this cut. Let 
T’ be a satisfying assignment (not necessarily same as 7) of 3,,n. By definition of 
B,,n, constant/symbol propagation starting from 7’ assigns the constant value 
Ux,n to every node n € N. It follows that for all leaf nodes | of the sub-circuit Cy, 
Ux 1 = Un. Hence, every internal node m of Cy must also have Ur’ ,m = Vrm- 
In particular the root node gets assigned the same value/symbol/formula that it 
had when we did constant /symbol propagation starting from r. In other words, 
gt| e vad However, Since 7 is a counterexample to conflict-freeness of y in 
vy, we know ptl & (y^). Therefore, y+ 
to conflict-freeness of y in y7. 

To see =z, satisfies the conditions required of a partial rectifier in Lemma 3, 
note that sup(y, )N{y, 9} = 0. Therefore, sup(=G,,1)N{y, Y} is also empty. Next, 
by definiton, if an assignment 7’ — 3,1, every node n € N in the circuit yt gets 
assigned the constant value Vr n. Using the same argument as in the first part of 
the proof, we can then show that Cl a, = (y A9). Hence gly S ptp > -y]|_, 
= y ^J | ~y] & 0. This shows that 6,7” = ~y. In other words, Y > =br,N- 
Finally, Brn, & Anew (yt |, 5 Ürn) However, Urn © prl, by definition. 
Hence 6z, Nl, $ 1 and hence =r, Nl, is unsatisfiable. 


on fa 
„ @ (yAy) and 7’ is a counterexample 


The above lemma allows us to use =r, y as a partial rectifier of y w.r.t. y in 
Algorithm RECTIFYONEOUTPUT. Significantly, this eliminates in one shot all 
counterexamples to conflict-freeness of y in y that are satisfying assignments of 
Br,n, thereby reducing the number of iterations of the loop in Algorithm RECTI- 
FYONEOUTPUT. As seen in the example above, 3,1 can indeed have many more 
satisfying assignments beyond m. We use this technique to implement the sub- 
routine PARTIALRECTIFIER in Algorithm RECTIFYONEOUTPUT. Specifically, 
we choose the set N such that the longest path of each node n € N from a leaf 
of C„ is within an empirically determined threshold (20 in our experiments). 


Generalizing Using Unsatisfiable Cores: It turns out that we can gener- 
alize counterexamples even beyond what was achieved above. To see a concrete 
example, consider the specification 7(X,y) = y(X,y) A (“y V (x1 A x2)), where 
p(X, y) is the same specification considered in Fig. 3. The NNF circuit represent- 
ing yt¥ (or y~ for short) is the same as that shown in Fig.3 with an additional 
/A-gate that feeds the root node, and that is fed by the J leaf and output of the 
orange node. The same assignment 7 as considered earlier serves as a counterex- 
ample to conflict-freeness of y in y, and the same set N can be chosen to obtain 
the same partial rectifier =3, where 8 = =(£1 A £2) A7(a2 A 23) Aa(423 A £4). 
Note, however, that in the circuit for y*, if the orange and purple nodes are 
assigned the value 0 by constant propagation starting from an assignment 7’, 
the root node must be assigned yA ¥, regardless of the value assigned to the green 
node. Therefore, we could have used 3’ = 7(x1 A x2) A 7(x2 A x3), which repre- 
sents a larger set of counterexamples than 8. Specifically, x1x2x%3x4 = 1001 does 
not satisfy 8 but satisfies 8’. It follows that rectification using 7/3’ eliminates 
more counterexamples in one go than rectification using =~. 

In general, given y, y, 7 and N as in our previous discussion, let s, be 
a fresh variable for every node n € N, and define the formula p,;,n = y A 
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Anen ((8n > (Yi  Uan)) A Sn). Since p > 78,7,n (see Lemma 5) and since 
Br, N = Mnen (Yn $ Vrn), it follows that pz, is unsatisfiable. Assuming ¢ is 
satisfiable (otherwise the synthesis problem is itself trivial), every unsatisfiable 
core of pr,y must set a subset of the sn variables to 1. Let U C N be the 
set of nodes n s.t. Sn = 1 in a minimal unsatisfiable core of p. Then pz,y = 
pA Nnev ((8n => (Yt 4 Vr, n)) A sn) is unsatisfiable. 


Lemma 6. Lemma 5 holds with Br, n replaced by Br, u. Moreover, Br N > bru- 


Algorithm 2 FINDSKBASISVEC(y(X, Y)) 
1: a := p; i := 1 > Assume |Y | 


2: repeat 

3; yi := Next output variable to find Skolem basis 

4: Ai := a1 4 71,,-0 

5: Bi := a — ea =o 

6: if y; is positive unate in a then 

7: | a:i=a > Existentially qua j 

yi=1 

8: else if y; is negative unate in a then 

9: | A= al, o> 2 tic l f 

10: else 

11: u = RECTIFYONEOUTPUT(Q, yi) > yi is 
. | d gati w Birdetantenlian g 

12: Q := pH y= g1 | 


13: i:=i+1 
14: until all outputs processed 
15: return ((Aj, Bi))i<i<n 


Overall Algorithm: We are now present Algorithm FINDSKBASISVEC. The 
algorithm initializes a running specification a to y. It then repeatedly chooses 
the next output y; for whose Skolem functions a Skolem basis needs to be com- 
puted. The choice of y; can be as per a static order, or as determined on-the-fly 
heuristically. The algorithm then finds Skolem basis (A;, B;) using Theorem 1 by 
treating y; as the sole output in the specification a. It next updates the running 
specification a by existentially quantifying y; from a. In order to do this, it first 
checks if y; is unate in a, and if so, substitutes an appropriate constant for y; 
in a to quantify it out. Otherwise, the algorithm invokes Algorithm RECTIFY- 
ONEOuTPUT. Thanks to Theorem 3, we can effectively and efficiently quantify 
yi from a by setting y; = 1 and % = 1 in the positive form of the formula p 
returned by RECTIFYONEOUTPUT. Once all outputs are processed, the algo- 
rithm outputs the vector of (A;, Bi) pairs computed as the Skolem basis vector. 


Theorem 4. Algorithm FINDSKBASISVEC terminates with a Skolem basis vec- 
tor for the specification p(X,Y). 


Proof. The proof of termination follows immediately from Theorem 3. The proof 
of correctness follows from Definition 1, Theorems 1, 3, and Lemmas 1, 2. 


Though we developed rectification as a technique for rendering a variable 
conflict free with the objective of generating Skolem basis vectors, it can be 
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independently used to compile a Boolean formula to a form that allows efficient 
quantifier elimination. However, a performance evaluation of rectification versus 
other quantification techniques in such applications is beyond the scope of this 
paper. 


7 Implementation and Experiments 


We implemented the above algorithms in C++ using the abc package [27] and 
ran our tool on a set of 602 Boolean functional synthesis benchmarks (also used 
in [12,14]). We used an Intel(R) Xeon(R) CPU E5-2660 v2@2.20GHz machine 
with 40 cores in single-threaded mode (multiple cores used only to run experi- 
ments in parallel). We set an overall timeout of 3600 seconds, within which the 
timeout for unate-check was 1000 seconds. 


Detailed Analysis of Our Results. We did an ablation study to understand 
which part of our approach was most successful in compiling the benchmarks. 
Our results are summarized in 
Fig. 4. Here, “Total solves” denotes 


the number (out of 602) bench- DO | SO | CDO | CSO 

marks for which Algorithm FIND- |1 [Total Solves 287 | 298 299 308 

SKBASISVEC completed within 2 |PAR2 Scores |3839.56/3672.65/3696.90/3565.01 

the timeout. “PAR2 score” is 3 |Average time | 151.28 | 74.29 | 146.94 | 95.24 
. . 4 jallUnates 98 98 98 98 

a widely used weighted perfor- 5 |someUnates 146 157 151 160 

mance score, computed as sum 6 |InoUnates 43 43 50 50 

of time taken (in seconds) for 7 |fixedConflicts | 71 19 73 21 

each solved instances and dou- |8 |noConflicts 118 181 128 189 

ble of timeouts (3600s)s) for each |’? fixedConflicts 

unsolved instance. For bench- 2 pomeUnates = 1 = wil 

3 10|noConflicts 

marks that were rectified, for each AsomeUnates| 78 141 82 143 

application of rectification, we  |11|fixedConflicts 

verified (using a SAT solver) that N noUnates 3 3 4 4 

the rectified circuit was seman- |!2\noContflicts 

tically equivalent to the original. ena a = = = 


The time for this verification is 
included when computing PAR2 
scores. In row 3, we note the “Average time” taken (including for verification), 
in seconds, over all solved instances. In rows 4, 5 and 6, we count, respectively, 
the number of solved benchmarks, where (i) all variables were unate (ii) some 
but not all were unate and (iii) no variables were unate (these add up to row 1). 
In row 7, we list the number of solved benchmarks for which there was at least 
one conflict, i.e., a call to the rectification algorithm was needed. Row 8 lists the 
solved benchmarks with at least one output that was not unate but no outputs 
having conflicts. The other rows are self-explanatory. 


Fig. 4. Table of results 


Order Dependence. Since a Skolem basis vector depends on the ordering of out- 
puts, we considered two order variants. In the first, we considered a heuristically 
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determined static order (denoted SO), taken as is from [14]. Then, we tried a 
heuristic dynamic order (denoted DO): after each output variable is processed, 
the next is obtained on-the-fly by applying the heuristic from [14]. 


Conflict Optimization in Calculating Skolem Basis Vector. We found several 
problem instances where the specification is not realizable, i.e., there exist input 
values for which no output values can make the specification true. For such 
instances, it is reasonable to restrict the computation of Skolem basis vector to 
a set F of Skolem functions, such that for any Skolem function w ¢ F, there 
exists Yy’ € F such that w and y’ differ only on the space of input assignments 
for which no assignment of outputs would satisfy the specification. It turns out 
that this can be easily encoded in Algorithm 1 by modifying the conflict formula 
Kuy tO Kuy A P(X, Y’), where Y” is a fresh set of variables. Doing this, along 
with the static/dynamic ordering gives us the “CSO” and “CDO” columns in 
Fig. 4. 


Observations. With either SO or DO, without conflict optimization, we are able 
to compute Skolem basis vectors for 299 of 602 benchmarks (286 were solved by 
both, 1 by only DO and 12 by only SO). Interestingly, the static order (SO) had 
fewer conflicts compared to the dynamic order (DO), when we had to rectify more 
often. Further, in the presence of conflict optimization, we are able to compute 
Skolem basis vectors for 309 out of 602 benchmarks. Note is that even though 
the PAR2 score is large, the average time taken is less than 2.5 min, including 
time taken for verification. In other words, when we are able to compute Skolem 
basis vectors, we are able to do so in remarkably short duration. 


Comparison with Other Tools/Approaches. There are no existing tools that syn- 
thesize a represention of the space of all Skolem function vectors. Knowledge 
compilation tools e.g., C2Syn [13], NNF2SDD [25,31] come closest as they try 
to obtain a single circuit that is semantically equivalent to the original and is in 
a normal form: the SynNNF form for C2Syn and the SDD form for NNF2SDD. 
Skolem functions hence could be potential alternative approaches. In practice, 
C2Syn does refinement (see [13]) operations for performance boosting, thereby 
restricting the space of Skolem function vectors. Even with this optimization 
for C2Syn it can compile only 218 (out of 602) benchmarks, while NNF2SDD 
compiles only 142 to SDD on the same computing platform. 

An apples-to-apples performance comparison of Boolean functional synthesis 
tools (that synthesize a single Skolem function vector) with our tool (that com- 
putes Skolem basis vectors for all Skolem function vectors) is not possible, since 
two different problems are being solved. Nevertheless, to understand the per- 
formance penalty incurred in computing a representation of all Skolem function 
vectors, we observe from [12] that with a 7200s s timeout and using a more pow- 
erful cluster, Manthan [12] (resp. BFSS [14]) could synthesize a single Skolem 
function vector for ~356 (resp. 247) out of the same 602 benchmarks. In com- 
parison, with 3600s s timeout, we are able to compute Skolem basis vector for 
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~ 300 benchmarks. In [17], an improved and highly engineered tool Manthan2 
was developed, which could synthesize a single Skolem function vector for 502 
benchmarks within 7200s.s. Interestingly, we are able to compute Skolem basis 
vectors for 22 benchmarks (out of which 13 have non-unate variables), for which 
even Manthan2 [17] fails to synthesize a single Skolem function vector. 


8 Conclusion 


In this work, we have introduced a representation for the space of Skolem func- 
tions, using the notion of Skolem basis vector. Our representation itself is criteria- 
agnostic, but allows the use of other existing techniques to optimize Skolem func- 
tions wrt different criteria. We develop a compilation algorithm that uses a com- 
bination unate and conflict-detection along with generalized counter-example 
guided approach to synthesize the Skolem basis vector. Our next step would 
be to identify specific problem contexts and optimization criteria and integrate 
our approach with the state-of-the-art logic synthesis tools to synthesize specific 
Skolem functions satisfying the given criteria. 
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Abstract. We provide a learning-based technique for guessing a win- 
ning strategy in a parity game originating from an LTL synthesis prob- 
lem. A cheaply obtained guess can be useful in several applications. Not 
only can the guessed strategy be applied as best-effort in cases where the 
game’s huge size prohibits rigorous approaches, but it can also increase 
the scalability of rigorous LTL synthesis in several ways. Firstly, checking 
whether a guessed strategy is winning is easier than constructing one. 
Secondly, even if the guess is wrong in some places, it can be fixed by 
strategy iteration faster than constructing one from scratch. Thirdly, the 
guess can be used in on-the-fly approaches to prioritize exploration in 
the most fruitful directions. 

In contrast to previous works, we (i) reflect the highly structured logi- 
cal information in game’s states, the so-called semantic labelling, coming 
from the recent LTL-to-automata translations, and (ii) learn to reflect it 
properly by learning from previously solved games, bringing the solving 
process closer to human-like reasoning. 


1 Introduction 


LTL Synthesis. [38] is a framework for automatic construction of reactive sys- 
tems specified by formulae of linear temporal logic (LTL) [87]. Since LTL is a 
prominent logic in the area of safety-critical and provably reliable dynamic sys- 
tems, LTL synthesis is a very tempting option to construct such systems since it 
avoids error-prone manual implementation; instead it is replaced with the need 
for a complete specification of the system (which is not trivial either, but in 
some cases easier). However, there is also an important computational caveat: 
the problem of LTL synthesis is 2-EXPTIME complete. Despite the infeasibility 
in the worst-case, many heuristics have been designed that can cope with practi- 
cal problems, as documented by the yearly progress in the synthesis competition 
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SYNTCOMP [18], which has an LTL track for a number of years. Yet, many 
reasonable instances even in the benchmark set of SYNTCOMP still remain 
practically unsolvable. In this paper, we aim at guessing a solution through a 
machine-learning model, even for hard cases, thus possibly providing an appli- 
cable answer, in a sense, without reading the input formula. We achieve that by 
learning from other games and by reflecting semantic information, bringing the 
process closer to human reasoning. 
The classic technique for solving LTL synthesis is to 


1. turn the LTL formula into a deterministic parity automaton (DPA), 

2. turn the DPA (and the partitioning of atomic propositions into system vari- 
ables and environment variables) into a parity game (PG) between the system 
and the environment players, and 

3. solve the PG; any winning strategy of the system player then directly induces 
a system policy (also representable as a circuit) satisfying the LTL formula. 


Due to the worst-case doubly-exponential blowup in the first step and the prac- 
tically bad performance of (Safra’s [39] and others’ [36,40]) determinization pro- 
cedures, this option was rarely used practically until direct, more practical trans- 
lations were given [8,12]. The significantly smaller automata [20] have made this 
approach feasible and, in fact, winning in SYNTCOMP since then. The app- 
roach is implemented in the tool Strix [33], which additionally constructs the 
DPA/PG only partially, on-the-fly until it finds a winning strategy for one of 
the players. This helps to overcome some more cases where the DPA is still very 
large; yet, more complex specifications often remain out of reach. 


Semantic Labelling. The key difficulty in the on-the-fly exploration is a good 
heuristic that prioritizes exploration in promising directions, so that a solution 
can be obtained quickly, without constructing “irrelevant” parts of the game. 

In a concrete state of a PG, is it better to go left or right? While this question 
obviously does not have a simple answer in general, we take a step back and 
instead of a PG we solve the LTL synthesis problem. For instance, consider 
a state of a PG corresponding to satisfying Ga, i.e. “always a holds”. Then, 
the letter {a} is clearly a better choice (for the system) than Ø. The former 
leads to the obligation of satisfying again Ga; the latter to the obligation ff 
(falsifying the formula). Taking the former edge does not guarantee winning, 
but the chances are certainly higher than giving up directly. In order to estimate 
the chances of winning with some obligation, we can evaluate it by randomly 
assigning truth values to temporal subformulae; intuitively, Ga can be true or 
false, so its “trueness” is 0.5, ff has trueness 0. Trueness is examined in [22] and 
utilized in newer versions of Strix [31] as guidance. 

Does every state correspond to a goal in LTL? And if so, can we determine 
which continuation brings us closer to satisfying it? Recall that the classic trans- 
lations of LTL to non-deterministic Biichi automata (NBA), stemming from [43], 
label the states of the NBA with a conjunction of LTL formulae, which are the 
current goals in this state. For deterministic automata, the situation is inevitably 
more complex. While the determinization procedures obfuscated any possible 
such semantic labelling, the more recent approach re-established it, e.g., [8] with 
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Fig. 1. Simple game where it is not clear which edges are “winning”. 


[26], or [42] with [9]. Beside the overall goal, it is necessary to also monitor the 
progress of subgoals. For example, consider GF(a ^A Xb) “infinitely often a is 
followed by b”. No matter what happens, the goal remains the same. However, 
whenever a, we are progressing with the subgoal of seeing the a — b sequence 
once, yielding a subgoal b, which is regarded as promising. 


Our Aim. In this paper, we aim at better guessing of winning decisions than in 
[22,31]. While the previous work only reflected trueness of the main goal, which 
is just the percentage of truth assignments leading to satisfaction of a Boolean 
formula, our approach reflects also (i) the temporal structure of the formulae, 
(ii) the monitored subgoals, and (iii) learns from previously solved games. On 
the technical level, we design over 200 structural features instead of just trueness, 
learn an SVM classifier comparing which edge is most promising, and use data 
from previously solved games, i.e. which edges are “winning”. As it turns out, 
defining this notion already is surprisingly tricky: We cannot simply use the 
output of classical strategy improvement algorithms, as there may be multiple, 
incompatible solutions. Indeed, already for reachability, there are no maximal 
permissive strategies [3], see Fig. 1. Here the edge (v2, v3) is winning iff (vs, v2) 
is not used, and vice versa; using both makes them losing. Nevertheless, they are 
“better” than, e.g., the self-loop on vı, which is always losing. Thus, we want to 
value both edges between v2 and v3 equally, and higher than the self loop on v1. 


Our Contribution can be summarized as follows: 


— We learn a model predicting which edge has better chances to be winning. To 
this end, we define features on the semantic labelling in Sect. 5.1, introduce 
a way to measure the degree of “winning” of an edge in Sect. 4, and apply 
learning of support vector machines using our novel ground truth in Sect. 5.2. 

— We evaluate “how winning” the suggested strategy is, i.e. how many wrong 
choices it made, on several inputs in Sect. 6.2. Surprisingly, this value often 
is 0, i.e. our strategy is often winning even for complex formulae, and even 
without reading them (meaning that our strategy is of constant size, inde- 
pendent of the formula, as opposed to a decision table in the concrete game; 
it can be run on the fly with no pre-computation, and decisions depend only 
on the labelling of the current state). 

— Besides, while Strix’s architecture and interface ask for a significantly 
different type of advice (not just for the better of two edges), we show 
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Strix already profits from our advice and—modulo our unoptimized advice 
implementation—speeds up significantly, as we see in Sect. 6.3. 


Usage of our Results: 


— We provide an immediate solution (without even reading the input formula), 
which is often winning; moreover, it is applicable even to games too huge to 
be analyzed in any way. Besides, it is even of a constant size, i.e. independent 
of the size of the state space. 

— Our approach opens the way to (i) a solver based on the semantic labelling, 
for instance, based on strategy iteration only quickly fine-tuning the already 
almost correct guess, and (ii) an on-the-fly-exploration advisor to Strix, with 
the proven potential to be the most efficient among the current techniques. 


Related Work. To the best of our knowledge, there is only one other approach 
to using machine learning in LTL-synthesis. Here, the authors train a very pow- 
erful model (a hierarchical transformer) in order to directly predict a controller 
or counter example solely off the LTL specification [41]. Further, if their predic- 
tion is refuted by a classical model checking algorithm, they train a separated 
hierarchical transformer to repair it [5] until it is correct. While this turns out 
to be an overall competitive approach that also manages to solve some instances 
where classical synthesis tools as Strix [33] fail, this does not yield a complete 
procedure, as the repair loop is not guaranteed to ever terminate. In this work, 
we aim to improve existing, complete procedures such as implemented in Strix 
by means of machine learning based heuristics. 


2 Preliminaries 


We introduce notation and provide an overview of necessary background knowl- 
edge. Due to space constraints, we only briefly comment on several topics and 
refer the interested reader to the respective literature. 

We use N to denote the set of non-negative integers. The constants tt and 
ff denote true and false, respectively. 


2.1 Synthesis & Games 


The synthesis problem in its general form asks whether a system can be con- 
trolled such that it satisfies a given specification under any (possible) environ- 
ment. Moreover, one often is interested in obtaining a witness to this query, i.e. 
some controller or strategy which specifies the system’s actions. 


Parity Games are a standard formalism used in synthesis. A parity game is a 
tuple G = ((V, E), vo, P,p), where (V, E) is a finite digraph, vo € V a starting 
vertex, P: V + {S,E} a player mapping, and p : V > N a priority assignment. 
Each vertex belongs to one of the two players S (called system) and E (called 
environment). In other words, the set of vertices is partitioned into player S’s 
vertices Vs and player €’s vertices Ve. See Fig. 2 for an example. 
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Fig. 2. An example parity game, taken from [22]. Rounded rectangles belong to the 
system S and normal rectangles to the environment E. The vertices are additionally 
labelled with their priorities. 


Remark 1. In our implementation priorities are assigned to edges instead of ver- 
tices, as this allows for a much more concise representation and suits most 
translations better. However, for ease of presentation, we consider state-based 
acceptance instead of transition-based. 


Playing. To play the game, a token is placed in the initial vertex vo. Then, 
the player owning the token’s current vertex moves the token along an outgoing 
edge of the current vertex. This is repeated infinitely, giving rise to an infinite 
sequence of vertices containing the token p = vpv,v2:-: € V”, called a play. We 
write p; to refer to the i-th vertex in a play. A play p is winning (for the system 
player) if the smallest priority occurring infinitely often is odd. (Using “maximal” 
instead of “minimal” or “even” instead of “odd” does not fundamentally change 
the problem at hand.) Formally, we define inf(p) = {v € V | Vj. 3k > j. pj = v} 
as the set of infinitely occurring states. Since the game graph is finite, this set 
always is non-empty. The smallest priority occurring infinitely often is given as 
p(p) = min{p(v) | v € inf(e)} and system wins the play p if p(p) is odd. 


Strategies. A strategy of player p is a mapping op : Vp > E assigning to each 
of p’s vertices an appropriate edge along which the token will be moved, i.e. 
(v,o,(v)) E€ E for all v € V,.' Once both players fix a strategy, the game is 
fully determined and a unique run is induced. We call a strategy of system os 
winning if for all strategies of the environment gg the induced play is winning, 
i.e. system wins no matter what the environment does. 

For example, consider again the game depicted in Fig. 2. Fixing the strategies 
as = {vo > (vo, V2), V2 > (V2, 03), v4 > (Va, v4)} and og = {v1 > (v1, V2), 03 => 
(v3, v3)} induces the play vovev3v3---. The set of infinitely often seen priorities 
equals {3}, hence the system player wins with these strategies. Moreover, the 
strategy oo is winning, since the play always ends up in either v3 or v4. 


Synthesis. With these notions, we can compactly define the synthesis question: 
Given a parity game G, does there exist a winning strategy for the system player? 
In the example above, oo is a witness to this question. 


1 Strategies may be more complex, e.g., by using memory. However, “positional” 


strategies are sufficient for parity games, thus we omit the general definition. 
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This problem is still intensely studied due to its broad applications. It also 
is one of the few problems which canonically lie in NP N coNP (even in UP N 
coUP [19]), with recent breakthroughs achieving quasi-polynomial algorithms 
[4, 14, 28]. 


Extensive-Form Game. A common notion in game theory is the extensive-form 
game. Intuitively, this means completely “unrolling” the game into an explicit 
representation. See e.g. [34, Chp. 5-7] for details. In our case, we consider the 
game tree, where each node corresponds to a simple path in the game G. Suppose 
we are in state s = (v1,...,v;) of the game tree. Then, the successors of s are 
determined by all successors of v; in the game, i.e. {u | (vj, u) € E} as follows. 
Suppose such a successor u already occurs along s, i.e. a loop is closed, we check 
if the corresponding play is winning or losing. In that case, the choice leads to 
a corresponding winning or losing leaf of the tree, respectively. Otherwise, i.e. 
when no loop is closed by the choice, it leads to s o u. Essentially, this game 
tree represents all potential simple paths (and thus, intuitively, all potential 
positional strategies) that can arise in the game, and each edge corresponds to 
a particular move of a player (also called ply in game theory). In particular, it 
is finite, however of potentially exponential size. Note that we can restrict to 
simple paths only because positional strategies are sufficient. 


Minimax Game Solving. A fundamental way to solve games is the minimax deci- 
ston rule, which intuitively corresponds to exhaustively exploring the extensive- 
form game (also discussed in [34]). Suppose we assign a value of 0 to “losing” 
leaves of the game tree and a value of 1 to the “winning” leaves. Then, we can 
“back-propagate” values by setting V(s) the maximum of all successors of s if 
it currently is the turn of the system player and the minimum if instead it is 
environment’s turn (which wants the system to lose). The game is winning if 
the value in the initial state of the game tree is 1. This approach is also called 
backward induction or retrograde analysis: starting from the winning / losing 
positions of the game, we consider all moves which could lead to such situations. 


Strategy Improvement (or strategy iteration, abbreviated by SI) is the most 
prominent practical way of solving parity games, i.e. answering the synthe- 
sis question. It received significant attention due to recent practical advances 
(13, 15,17,32] and modern tool developments [6,33]. We explain the approach 
briefly, since its details are not important for this work. Intuitively, SI starts 
from arbitrary initial strategies for each player, and then performs the follow- 
ing steps in a loop. First, we check whether either strategy is winning. If yes, 
the algorithm exits, returning this strategy. Otherwise, one of the strategies is 
improved by changing its choices in some vertices. If an improvement is not pos- 
sible, there exists no winning strategy for the respective player. Otherwise, the 
process is repeated with the new strategy. 

This algorithm converges to the correct result in finite time for any initial 
strategy. However, if this initial strategy is chosen “close” to a winning strat- 
egy, then SI intuitively needs to perform fewer steps to converge to an optimal 
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one. Thus, a heuristic which often comes up with a “good” initial strategy may 
improve the runtime significantly over arbitrary or random initialization. 


2.2 Linear Temporal Logic and Reactive Synthesis 


Linear Temporal Logic (LTL) [37] is a standard logic used to specify desired 
behaviour of a system. The syntax usually is given by 


o: = ff |a|76|¢A6|Xb| JU Q, 


where a € AP is an atomic proposition, inducing the alphabet © = 2^P. These 
formulae are interpreted over infinite sequences w € ©” called w-words. A word 
w = wow ::: E XY satisfies the next operator Xø iff ¢ is satisfied in the next 
step. Similarly, the until operator GUy is satisfied iff ¢ holds until w is eventually 
satisfied. Usual abbreviations are defined as finally Fé = tt U ¢ and globally 
Go = -=F-¢, which require that ¢ holds at least once or always, respectively. 
Moreover, the construction underlying our work also considers strong release 
oMy=wvU (WA), (weak) release o R y = Gy V (dM Y), and weak until 
o W y = Go V (dU y). Considering these additional operators allows formulas 
to be represented in negation normal form, i.e. the negation — only appears in 
front of atomic propositions. In the interest of space, we refer to [12] for precise 
definition on the semantics and discussion of these subtleties. Understanding 
these issues is however not required for this work. 


LTL Synthesis is an instance of the general synthesis problem, where the spec- 
ification to be satisfied is given in form of an LTL formula [38]. Due to recent 
advances [11,12,16,20,21,25], the automata-based approach [43] to LTL synthe- 
sis received significant attention. In particular, the tool Strix [33], built on top 
of Owl [24], which in turn implements these ideas, won several iterations of the 
synthesis competition SYNTCOMP [18]. Essentially, the given LTL formula is 
translated into an w-automaton, which in turn is transformed into a parity game. 
Solving the resulting game yields a solution to the original synthesis question. 

This game is obtained by “splitting” the automaton, as follows. The set 
of atomic propositions is split into system- and environment-controlled propo- 
sitions, i.e. AP = APs U APg, and the players’ actions correspond to choosing 
which of their propositions to enable. Once both players chose their propositions’ 
values, the automaton moves to the next vertex according to the players’ choices. 
Concretely, for an automaton state p, the environment can choose to move into 
(p,v) where v C 24", and from there, system can move to any automaton state 
q = 6(p, v'Uv) where v’ C 2^Ps and 6 is the transition function of the automaton. 
In particular, this means that the obtained game is alternating, i.e. system and 
environment take turns in alternation. Moreover, by convention the environment 
moves first. See e.g. [33] for more details on this approach. 


Semantic Translations from LTL to automata are the key ingredient to our 
approach. On top of providing a parity game, they also give a semantic labelling, 
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Fig. 3. Motivational example to provide guidance through semantic labelling. 


i.e. interpretable meaning, to the game’s vertices. In particular, the approach 
introduced in [8] (see also [10-12]) and implemented in Owl [25] intuitively yields 
for each vertex a list of LTL formulae, which roughly correspond to (sub-)goals 
which still have to be fulfilled, possibly repetitively. 


2.3 Our Goal 


In this work, we want to demonstrate that this semantic labelling can be effi- 
ciently exploited for reactive synthesis. For a motivational example to consider 
semantic labelling, we display a (vastly simplified) labelled game in Fig. 3. We 
are offered with the choice of choosing a or ~a. While it is not completely clear 
that choosing a is indeed better, it certainly seems to be more promising, as 
the subsequent labelling seems much “easier” to handle. Thus, faced with a 
choice, we likely would first try to win with a. Observe that without the seman- 
tic labelling, our best option in this situation would be a random guess. In [22], 
the authors used a simple, manually designed mechanism trying to capture this 
notion, called trueness. Motivated by the (surprisingly good) results of this app- 
roach, we want to tackle this problem by more sophisticated means. Concretely, 
we want to make meaningful decisions based on the labelling. However, while 
the theory underpinning semantic translations is quite clean and pleasant [12], 
the actual labellings appearing in practice are quite complex. To further com- 
plicate things, the highly optimized implementation thereof [25] employs several 
subtle optimizations and special cases. We provide an example to showcase the 
complexity of this labelling in practice later in Sect. 5, kept brief in the interest 
of space, and a small real-world example in [23, Appendix A.1]. Since we have 
a simple intuition which however seems difficult to formalize, we opt to tackle 
this problem through means of machine learning. 


3 Previous Approaches and Their Limitations 


In this section, we briefly summarize the ideas of [22] and the inherent problems 
associated with them. The primary motivation of [22] is to exploit the seman- 
tic labelling provided by [25], which gives us an indication of the long term 
goals in the game. As an analogy, consider the game of chess. Here, the “seman- 
tic labelling” is given by the board state, i.e. the position of each piece. This 
labelling provides us with a reasonable indication of (i) our current situation 
and (ii) which moves might be better than others. In particular, understanding 
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and evaluating the semantics of the game is what allows humans to have a good 
intuition about the quality of moves, without thinking through the intractably 
large game tree. Likewise, this understanding is what enabled algorithms to per- 
form beyond human capabilities. 


3.1 Parity Game Solving by Trueness 


A central notion of [22] is trueness, an approximation of how close a formula is 
to being satisfied, i.e. tt. The intuition is that the semantic labelling of states 
effectively describes “goals” of the system player. If the formula is tt, the system 
has satisfied all goals and consequently won the game. Likewise, increasing the 
trueness is indicative for a good move. Remaining with the analogy of chess, 
trueness somewhat corresponds to counting the number of pieces on the board (or 
rather the difference between our and the opponent’s pieces): If no enemy pieces 
remain, we certainly have won, and a change of this difference, i.e. capturing an 
enemy piece or avoiding capture of own pieces, is a good indicator for the quality 
of a move. In particular, this prohibits us from taking moves which immediately 
lead to a piece being taken. 

In [22], the authors propose two ideas. First, they suggest to use a trueness- 
maximizing strategy as initial one for strategy iteration, i.e. in each state select 
the edge which maximizes (or minimizes, in the case of £) the obtained trueness. 
Second, they use Q-Learning, a popular reinforcement learning approach, as 
a solver for parity games, i.e. as competitor to strategy iteration, using three 
different reward signals. There, each edge is given a reward, which is mostly 
based on (the change of) trueness, and these values then are back-propagated 
until choosing optimal rewards in each step yields a winning strategy. 

While they also show Q-Learning to be an interesting avenue, we primarily 
focus on the “initializing strategy iteration” approach, since our goal is to aug- 
ment exiting strategy iteration solvers. Moreover, the experimental evaluation 
of [22] suggests that Q-Learning scales poorly to large real-world formulae. 


3.2 Problems 
We now outline two key issues of this approach. 


Myopic Trueness The primary heuristic in [22] is trueness. While this app- 
roach already performs surprisingly well, especially for so called safety and 
co-safety formulae, it fails to take into account temporal dependencies; true- 
ness is myopic. Again, considering chess, while counting the change of pieces 
does help us avoid “obviously stupid” moves, it does not stop us from moving 
pieces into positions where they are effectively guaranteed to be taken even- 
tually and does not allow for sacrificing a piece in exchange for a long-term 
advantage. 

Manual Design Their reward functions were defined manually, in contrast to 
being obtained from a learning process. While the intuition behind these 
definitions is reasonable, obtaining a guidance heuristic as a result of an opti- 
mization process is a much more principled approach. 
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We proceed to outline how we tackle these issues by a more sophisticated app- 
roach. 


4 A New Hope 


We want to improve reactive synthesis by applying machine learning. As already 
motivated by [22], we want to approach this problem by identifying “promising” 
edges, choosing those as initial strategy for SI. Naturally, as a first step, we need 
training data for our learning approach. In particular, we need to identify which 
actually are the actual good choices in games, i.e. the ground truth. As it turns 
out, this is more complicated than one might expect. 


4.1 Obtaining Training Data with SI 


As SI allows us to solve a game and determine winning edges, one might try to 
employ SI for obtaining a ground truth (as we did initially). However, SI actually 
provides us with potentially misleading or even conflicting information! As we 
already hinted in the introduction through Fig. 1, SI cannot give us a canonical 
ground truth. In the example, one edge is winning iff the other is not used, and 
vice versa. Thus, SI will yield a strategy which does not take both edges and we 
would consider one of them losing. Moreover, note that there is no fundamental 
reason to prefer one edge over the other, so SI might in one run classify the edge 
from vs to v3 as good and in a second run (or on a similar game) do the opposite 
or even consider neither winning. The underlying problem is that parity games 
do not allow for a unique maximally permissive strategy (see e.g. [3]), thus we 
cannot derive the “suitability” of an edge from a single solution strategy. 


4.2 Solving the Game Tree 


Instead of using a particular strategy obtained from SI, we therefore propose to 
identify “all” solutions, i.e. all edges which are part of a winning strategy. More 
formally, for each vertex v we want to determine the value of each outgoing edge 
in the corresponding game tree rooted at v. To prefer “shorter” solutions over 
larger, we add a beta-decay to the value. Concretely, suppose we consider the 
game tree state s = (v1, ..., vi) which ends in a system state vi. Then, the value 
of s is defined by val(s) = 8 - MaXs'esuccessors(s) Val(s’) for a fixed 0 < 8 < 1. 

As we already mentioned, evaluating this tree is intractably large, namely 
exponential in the size of the game, which itself is already doubly-exponential in 
the input formula [27,38]. Thus, we employ a classical technique of game theory. 


4.3 Monte Carlo Tree Search (MCTS) 


Intuitively, we explicitly unfold the tree up to a specified depth, e.g. 7 plies, 
and then assign the results of (guided) random sampling to the occurring leaves, 
approximating the (beta-decayed) value of the game in these vertices. 


400 J. Křetínský et al. 


We describe our method to approximate the value of a node s = (v1,..., vi) 
in the game tree. In essence, starting from v;, we randomly select successors, 
with the following restrictions for each player. The environment plays optimally, 
i.e. if a state is winning for the environment (which we can determine beforehand 
through classical approaches) we immediately stop sampling and return a value 
of 0. Otherwise, the environment heuristically tries to delay the play as long 
as possible (decreasing the value the system player obtains due to beta-decay). 
In contrast, the system player checks in a one-step lookahead if a choice is 
trivially winning, i.e. leading to a state labelled tt, always choosing such an 
edge if one exists. Otherwise, the system randomly chooses among edges which 
are not trivially losing, i.e. lead to a ff state. If either player closes a loop, i.e. 
selects a successor which already occurs along the path, we determine the value 
by checking if the loop is winning or losing. A loss yields a value of 0, while 
a win yields 8!®8stb, In summary, we approximate the probability of winning 
by playing randomly (avoiding obvious mistakes) against an optimal opponent, 
under-approximating the true value. We deliberately opt for this random-choice 
approach to prefer regions where there is less potential for error. 


4.4 Optimizations 


While MCTS makes approximation of the game tree value feasible, we added 
several further technical improvements to arrive at a practically viable method. 


SCC Decomposition. We exploit the structure of the game by decomposing it into 
its strongly connected components (SCCs) and put them in reverse topological 
order. Computing (or approximating) the value in that order allows for caching: 
Once a run in the game tree leaves an SCC, it can only reach SCCs further down 
in the topological order, and, since we compute values in this order, the value of 
the reached state is already known, allowing us to re-use it immediately. 


Pruning. In addition to employing the MCTS values as game values in the tree 
expansion, we also use it to prune the game tree. In particular, once we computed 
the Monte Carlo values for each state, we restrict the choice of the environment 
to the successors which yield (close to) the lowest Monte Carlo value (recall that 
the environment prefers lower values). We empirically chose 0.02 as a threshold, 
i.e. we only keep those edges for the environment which are within 0.02 value 
of the lowest decision. While in theory this might remove crucial paths due to 
statistical fluctuations of MCTS, in practice it allows for a much deeper game 
tree, which in our experiments heavily outweighed the theoretical downside. 


5 Handling the Truth 


We introduced a way how to obtain a well-founded notion of “value” (to be 
precise, an approximation thereof) for a choice, i.e. an indication how good this 
choice is. As such, we can rank edges by their value in each state. Intuitively, 
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picking an edge which is ranked very highly should correspond to a good chance 
of winning. A high value means that even against an optimal player we can very 
likely close a winning loop, and, due to beta decay, do so quickly, thus minimizing 
the chance for an error. 

Recall that our goal is to provide a good initial strategy. Thus, the exact 
values actually are irrelevant, since we only want to give the best edge as initial 
choice. Instead of trying to predict the exact value, we therefore want to learn 
this relative ranking. Formally, suppose we consider a system vertex v € Vs with 
edges E, = {(v,u) | (v,u) € E}. A ranking of edges effectively corresponds to 
a (total) order <, C E, x E,. The principle of pairwise ranking [30] suggests 
that we learn a function f : E, x E» — {-1,1} that classifies pairs of edges 
depending on which one is the better choice, i.e. f(e,e’) = 1 if e <, e and 
—1 otherwise. However, such a function might not be perfect. For example, we 
could get f(e1,e2) = 1, f(e2,e3) = 1, and f(e3,e1) = 1, which is incompatible 
with any order. Thus, learning to rank suggests to determine an ordering < that 
minimizes the inversions w.r.t. f, i.e. the number of cases where f(e,e’) = 1 but 
e <» e'. This problem, called rank aggregation, is known to be NP-hard, and 
we employ a greedy approximation as suggested by [30]. 

Our concrete goal thus now is to learn such a function f based on the semantic 
labelling of the start and end vertices of the two edges. We want to employ 
machine learning for this purpose: While the high-level intuition of the semantic 
labelling is rather clear, the actual implementation used to obtain the games [24] 
employs numerous optimizations, separate cases, etc. To provide the reader with 
a sense of the complexity, we display a single edge in the automaton obtained 
for a simple formula in Fig. 4, and a real-world scenario in [23, Appendix A.1]. 


((aAbA Gb) V ((cV Fc) AGF c)) 
M _ co-safety: [eV E c] {ars tt,b ff,c ff} ((ceVFc)AGFco) 
Ty: ; 1c 
safety: tt a M.: co-safety: [c V F c] 
1- 
Mo: co-safety: [tt] safety: tt 
a safety: a\bAGb 


Fig. 4. A single transition in the automaton computed for the formula (a ^ Gb) V GFc. 


We proceed to describe (i) (some of) the features we use, i.e. which quantities 
we extract from the labelling, (ii) the model we employ, and (iii) the dataset and 
methodology used to train our model. 


5.1 Features 


In total, we have defined over 200 different features to convert the edges into 
a usable vector of reals. In the interest of space we only present the high-level 
ideas of a small subset which covers most interesting ideas. 

Since most information is contained in the states rather than in the edges 
themselves, the majority of our features are defined for the former. An edge is 
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then either associated with the feature value of its successor or with the change 
in a feature value between its predecessor and successor. As indicated in Fig. 4, 
the semantic labelling comprises several formulae, namely a “master” formula, 
which intuitively indicates the global state, and several “monitors” (which them- 
selves comprise several formulae), monitoring repeating sub-goals. We define base 
features, which convert a single formula to a single number. These features can 
then be applied to both the master as well as monitor formulae, where further 
aggregation is necessary. Some notable base features are the following: 


Number of Conjuncts We count the number of conjuncts if the top level 
operator is a conjunction and otherwise default to 1. The intuition behind this 
feature is that less conjuncts tend to correspond to a less constrained formula. 
Further, reducing the number of conjuncts along an edge often means that 
sub-goals have been achieved. (We consider several further syntactic features 
such as the number of disjuncts, the height of the syntax tree, or the number 
of temporal operators, which all follow similar ideas.) 

Trueness Since this has proven to be a solid heuristic on its own, we again 
incorporate it as a feature. 

System Control This feature (and variations thereof) incorporate the infor- 
mation of the variable partitioning by approximating how much impact the 
choice of the system variables has on the truth value of the formula. Intu- 
itively, a higher system control is desirable. Further, this feature also coun- 
teracts false positives of, e.g., trueness, as high values of trueness are worth 
much less if the system has no control on whether one of the many satisfying 
assignments is played. 

Obligation Set This group of features is based on the idea of obligation sets as 
introduced by [29]. In essence, an obligation set for a formula y is an assign- 
ment that, if played indefinitely, satisfies the formula. Using the inductive 
definition of [29], we can compute a formula y’ whose satisfying assignments 
are exactly the obligation sets of y, see [23, Appendix A.2]. Using this new 
formula, we can obtain numerous new features by applying other base fea- 
tures to y’. In particular, we are interested in the new formulas trueness as 
this indicates how many obligation sets exist. Further, we are interested in 
its system control, as a higher value makes it more likely that the system can 
enforce at least one obligation set. 


In addition to the base features, we define the following edge-specific features: 


Priority As priorities are crucial for winning a play, it is only natural to incor- 
porate that information in our features. However, as SVMs struggle with par- 
ity information, we reorder the priorities by how beneficial they are for the 
system and map them to [—1, 1] (similar to [22]). In particular, the smallest 
odd priority gets mapped to 1 and the smallest even priority to —1. For this 
normalization, we use an a-priori upper bound provided by the underlying 
automaton construction. 

Progress This feature is rather similar to [22]’s progress feature. We com- 
pute the percentage of already succeeded sub-goals of a monitor (instead of 
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their trueness) and aggregate by weighted average (rather than maximum). 
Additionally, we introduce punishments for failing monitors. Intuitively, this 
encourages long-term progress for temporal goals. 

One Step Here, the idea is to recommend an assignment that is to be played 
in the current state by traversing the syntax tree and propagating recom- 
mendations upwards, which is inspired by message passing in graph neural 
networks. For example, if we see a A b we strongly recommend playing a and 
b, if we see F(a A b) we take the previous recommendation and tune it down, 
since F is “less urgent”. The feature value is obtained by measuring how well 
the valuation of an edge aligns with the recommended assignment. 


5.2 Pair Classification by Support Vector Machines 


To instantiate our pair classification function f, we opt for support vector 
machines. In principle, one could employ any binary classifier, which is why 
we also experimented with other models such as decision trees, random forests 
or gradient boosted trees. However, SVMs proved to perform best, which we 
attribute to their great ability to generalize due to their margin maximiz- 
ing nature [30]. Additionally, SVMs are rather simple (compared to our other 
options) and provide us with extra information known as confidence. Given by 
the distance of the predicted sample to the decision hyperplane, its magnitude 
can be interpreted as how confident the SVM is in its prediction. We denote the 
confidence of a pair (e1,e2) by c(e1,e2) and use it to slightly alter the greedy 
ranking algorithm from literature. To rank the edges of a vertex v, each edge 
e € E, gets assigned a score s(e) = Sele Ry ete c(e, e’). Recall that if we predict 
e <, e’, the confidence is negative. Finally, we rank the edges according to their 
score, where a higher score corresponds to a better edge, and the recommended 
strategy is obtained by playing the highest ranked edge for each state. 


5.3 Further Notes on Implementation 


In addition to the feature extraction, there are several other engineering aspects, 
which are crucial for the final performance. In this section, we comment on the 
three most important ones. 


Statewise Feature Normalization. Before passing the features to the model, we 
proceed to normalize them. Due to possible future applications in on-the-fly 
solvers, we only consider feature values of edges from the same state for this 
normalization. The crucial observation is that this already introduces compara- 
tive information in the features. A normalized trueness value of 1, for example, 
means this edge has the best trueness among all other edges from their state 
although it does not tell us anything about its absolute value. While the latter 
might also be important in theory, we observed that in practice the statewise 
normalized value is more important with only a few exceptions. 
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State Classification. We observed several significantly different behaviours 
required in different states. For example, in some states we need to exclusively 
focus on the master formula, while in others only the monitors play a role. This 
also relates to the underlying principles of the automaton construction. It is very 
difficult, especially for a simple model like an SVM, to switch between different 
behaviours. We divide states into three groups which approximate the different 
classes, and train separate models for each class. The three classes we suggest are 
(i) states without monitors, (ii) states where the master formula does not change 
in any successor, (iii) and states that fall into neither category. In addition to 
having the separate models learn separate behaviours, we can also provide them 
with separate feature sets that only include relevant information. For example, 
the first class only requires features of the master formula, whereas these can be 
neglected in the second one. 


Complement Construction. The underlying automaton construction uses the fact 
that the system being able to enforce satisfaction of a formula y is equivalent to 
the environment being able to enforce falsification of ~g. In other words, solving 
the game for the negated formula and swapped roles yields the same result. 
However, in the game obtained for ~g the role of “system”, the player who choses 
second and for which we learnt the recommendation, i.e. for transitions from 
states (p, v) to q, now corresponds to the original environment. This drastically 
changes the meaning of features. For example, a trueness of 0 suddenly is very 
desirable. We tackle this by training separate models for both cases. Together 
with state classification, this yields a total of 6 different models that we assemble 
for our heuristic. 


5.4 Training the Model 


With these ideas at hand, we conclude this section by discussing our dataset, in 
particular how we preprocess it, and how we train our model. 


Dataset and Preprocessing. As one of our goals is to exploit human bias in writ- 
ing LTL formulae, the foundation of our dataset is given by the LTL benchmarks 
of SYNTCOMP.?. To further augment the data, we mutate these formulae by 
randomly replacing temporal operators. This yields new (random) samples that 
syntactically resemble the original, human-written structure. For practical rea- 
sons, we only consider formulae which can be converted to a DPA within 10 min. 
Ultimately, this leaves us with 405 original and 514 mutated formulae, of which 
we use 60% each for training, 20% for validation, and 20% for evaluation. 
Obtaining the edge pairs for training requires several further steps. First of 
all, we exclude trivial cases that can easily be detected by simple rules (see Sect. 
4.3), allowing our model to focus on complicated cases. Further, we exclude pairs 
where the ground truth value happens to be equal, as it is unclear which edge 
the model should predict. In particular, we exclude all edges originating in losing 


? Available on GitHub https: //github.com/SYNTCOMP/benchmarks. 
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states (since there is no sensible action to recommend). Finally, we only include 
a limited amount of pairs per game in the training set: Pairs of the same game 
tend to look similar, thus a few disproportionately large games would result in 
a very unbalanced dataset. All remaining edge pairs are added in both orders, 
i.e. ((€1, €2),y) and ((e2,€1),—y), where y € {1,—1} determines which edge is 
better, in order to prioritize teaching symmetry to the model. 


Training. For each of the 6 models, we first compute mean and standard devi- 
ation of the respective training set and use them to standardize the input to 
N(0,1). Further, we perform recursive feature elimination for each state class 
individually, adapted to features appearing twice (once for each input edge). For 
each state class, we ended up with 30—40 features. 

For the actual training process, we performed an extensive grid search for 
several model types (decision trees, random forests, etc., see Sect. 5.2) in order 
to determine suitable values for the hyper-parameters. As mentioned earlier, 
we ultimately opted for the SVMs due to their simplicity and generalization 
abilities. 


6 Experimental Evaluation 


In this section, we present experimental evaluation of our tool SemML. The model 
was learnt by communicating the relevant data to a Python process running 
scikit-learn [35]. We then extracted the learnt weights and, based on them, 
implemented the recommendation procedure in Java, on top of Owl [24]. The 
artifact can be found at [1], which references a slightly improved version from 
the one we submitted to the artifact evaluation [2]. 


6.1 Evaluation Goals 


Our primary goal in this work is to show that our approach, enabled by our 
new ground truth, can be used to solve more complicated instances than the 
approach of [22], in particular formulae going beyond pure (co-)safety. Thus, our 
first evaluation goal is the following: 


Research Question 1: How much does our model based on SVM and the 
game tree ground truth outperform the trueness-based initial strategy rec- 
ommendation approach of [22]? 


We refer to the trueness-based initial strategy of [22] as TrueSTI. 

Although not the focus of this work, we ultimately want to improve synthesis 
through meaningful exploration guidance, in particular, by suggesting likely win- 
ning edges. Thus, we are interested how our prototype performs in a real-world 
scenario. 


Research Question 2: How do initial strategies recommended by our app- 
roach synergize with state-of-the-art synthesis tools? 


We address both questions separately. 
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6.2 RQ1: Quality of Initial Strategy 


Datasets. To fairly compare to [22], we consider the same dataset, i.e. randomly 
generated LTL formulae, split into three categories: “(Co-)Safety”, “Near (Co- 
)Safety”, and “Parity”. See [22] for details on how these are obtained. In essence, 
the tool randl1t1 [7] is used to generate random formulae with different biases. 
Then, we filter out formulae which need more than 10 min to be translated to a 
parity automaton. As a second dataset, we also use some (original and mutated) 
SYNTCOMP formulae (the test set described in Sect. 5.4). We only consider 
formulae where the corresponding game can be won by system. We do this simply 
because we can only recommend on games which are winning — otherwise there 
is no preference on edges since every action is losing by definition. In total, this 
leaves 262 randomly generated formulae and 123 from SYNTCOMP. 


Metrics. We consider two metrics for our comparison. Firstly, similar to [22], 
we consider the fraction of immediately solved games, i.e. games where following 
actions recommended by SemML or TrueSI directly yields a winning strategy. In 
light of our motivation to augment SI solvers, we want to measure how “close” 
the recommended strategy is to being correct in case is not immediately winning. 
To this end, we feed it to (a modified version of) the parity game solver Oink 
[6] and compute the (relative) distance of the obtained strategy, as follows. We 
count the number of (reachable) states in which the winning strategy determined 
by Oink differs from the recommended one, i.e. how many “wrong” choices were 
recommended, and divide it by the total amount of (reachable) states. We note 
that this unfortunately induces a slight bias that we cannot measure: Oink may 
potentially change winning decisions because of internal details of the algorithm. 
Ideally, we would want to obtain the minimal distance over all winning strategies; 
however this quantity is intractable to compute due to the exponential size of 
the strategy space. Nevertheless, we believe that this measure strongly correlates 
with the quality of the strategy. 

We argue that simply measuring the number of iterations required by strategy 
iteration to converge is a too crude metric: On the one hand, even a “very wrong” 
strategy can be changed to a winning strategy in a single iteration by changing 
the choice in every single state. On the other hand, even a nearly correct strategy, 
requiring only a hand full of changes, may need as many iterations. Moreover, 
this additionally induces the same bias as above. 
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Table 1. Summary of our comparison between TrueSI, the approach of [22], and our 
tool SemML. We first list the fraction of immediately winning strategies (larger is better), 
followed by the geometric mean of the relative distance, i.e. the fraction of states in 
which the decision was adapted by Oink to obtain a winning strategy (smaller is better). 
For the first comparison, we also consider random initialization as a baseline. For this 
second comparison to be fair, we only consider games where neither tool yielded an 
immediately winning strategy. 


Tool (Co-)Safety Near (Co-)Safety | Parity | SYNTCOMP 
Immediately Solving 
TrueSI 100% 85% 66% | 44% 
SemML 99% 95% 88% |85% 
Random 7% 2% 5% 3% 
Relative Distance 
TrueSI - 75% 45% | 29% 
SemML - 52% 28% | 16% 
Ratio of both | — 1.4 1.6 1.8 
1 
xe 08F a a a 
E a | b / An i 3 
ook So | È 
0 o 2 4 6 gs 
10 10 10 10 0 0.20.40.60.8 1 
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Fig. 5. A detailed comparison on SYNTCOMP formulae. The left plot compares how 
many games were immediately solved, grouped by size and considering the (arithmetic) 
mean in each group. SemML’s values are displayed by crosses, TrueSI by circles. The 
right plot compares the relative distance of SemML’s and TrueSI’s solutions. 


Expectations. Since our approach incorporates trueness as one of its many fea- 
tures, we expect that our approach should be at least on par with the previous 
one of [22]. As we also consider long-term temporal information beyond true- 
ness, we particularly expect to outperform TrueSI on larger, more complicated 
instances. 


Results. We ran this evaluation on consumer hardware (Intel Core i7-8565U 
with 16GB RAM). We summarize our findings in Table 1. Clearly, our approach 
vastly outperforms the previous one. In particular, while TrueSI perfectly han- 
dles (co-)safety formulae, its performance quickly drops when going to more 
complicated formulae. In comparison, the SemML solves the vast majority of for- 
mulae immediately, even on the quite complicated SYNTCOMP dataset. We 
note that these findings are not “absolute” (as to be expected from machine 
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learning approaches). There are few instances where the previous approach does 
perform better. Our baseline comparison to a random initialization approach 
validates that both approaches indeed solve a non-trivial problem. 

Since we are particularly interested in complex, “human written” formulae, 
we investigate the SYNTCOMP dataset more closely. In Fig.5, we provide a 
more detailed view on our two metrics. First, we investigate how the “immedi- 
ately solving” performance evolves in comparison to the size of the game, which 
intuitively correlates with the difficulty of the synthesis question. We observe 
that SemML solves practically all smaller games and still performs well on larger 
games, compared to TrueSI, which quickly falls off. The second plot displays the 
relative distances for each instance which neither recommendation solved imme- 
diately. We clearly see that the strategies recommended by SemML are better in 
almost all cases. 

This positively answers our first question. Aside from the direct comparison 
to the previous approach, the significant percentage of immediately solved games 
gives us an interesting implication: If SemML solves many games immediately, we 
can use SemML as a best-effort guidance tool for reactive synthesis questions 
which are intractably large to solve. Moreover, SemML thus presents us with a 
constant size representation of a winning strategy for many games, effectively 
described by approximately a few hundred SVM weights compared to a decision 
table for thousands of states in each game. 


6.3 RQ2: On-the-fly SemML 


In our second experiment, we evaluate the suitability of SemML for real-world 
parity game solving by using it as guidance tool for the state-of-the-art reactive 
synthesis tool Strix [33]. 


Striz’ Anatomy. We first briefly describe how Strix works and how it uses 
guidance heuristics. In essence, Strix builds the parity game on-the-fly, i.e. 
iteratively constructs parts of the game it deems important. Then, two strategy 
improvements are running in parallel, one for either player. Not yet explored 
states are treated as losing for both. In this way, if we find a winning strategy for 
either player on the constructed part of the game, it is winning for the complete 
game. Otherwise, we need to explore further. Here, a key ingredient for practical 
efficiency is a heuristic to decide which states should be explored first: If we 
explore states reachable under the “smallest” winning strategy, we naturally find 
this strategy as quickly as possible. In its current form, Strix employs trueness 
for this guidance and selects an automaton edge with the globally highest trueness 
for exploration. (Dually, edges with the lowest trueness are also followed, since 
these are “promising” for the environment.) 


Integration. We integrate SemML with Strix as follows. Suppose we are asked to 
compute a global score for an automaton edge e = (p,q) (recall that SemML gives 
local advice on edges in the game). We explicitly build up the game between the 
automaton states p and q, i.e. all choices of the environment in p followed by the 
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respective system choices. For each occurring system state s, we compute the 
SemML ranking score as explained in Sect. 5.2, i.e. the confidence based score. 
This only gives us local information: the magnitude of our score only reflects the 
preference relative to actions available in the system state s = (p,v). Since the 
previously used trueness proved to be a good indicator for global progress, we 
multiply our local score by this global value. Finally, to obtain a value for the 
automaton edge, we take the minimal value of all arising system states, since 
the environment chooses first. We additionally apply straightforward rules such 
as assigning values of 0 and 1 values to ff and tt states, respectively. Finally, 
Strix by default employs a decomposition approach, which does not build a 
single DPA. Therefore, SemML would not be applicable, and we disable it for the 
purpose of evaluation. 


Dataset. We considered 188 randomly selected formulae of SYNTCOMP (which 
were not used in the training of the model), also including unrealizable ones. 


Metrics. We evaluate the total required time to solve the game and compare 
to Strix in its normal configuration. Since we expect the unoptimized compu- 
tation of SemML’s advice to take considerable time, we separately measure the 
required time and additionally perform a comparison with this time subtracted. 
Since our scoring function is a straightforward SVM, we strongly believe that by 
tailoring the evaluation to Strix’ requirements, it can be significantly sped up. 
In particular, our advice computation re-constructs information which is com- 
puted during the exploration of the automaton but difficult to access without 
significant changes to both Strix and Owl. 


Expectations. We do not expect this approach to work to its full potential 
because Strix architecture does not exactly fit our approach (recall that our 
primary motivation was to compare to [22]). We discuss these differences and 
possible ways to address them later. Moreover, as we construct the intermedi- 
ate game states for every recommendation and evaluate the recommender SVM 
several times, we expect that significant time is spent computing the advice of 
SemML. 


Results. We conducted our experiments on a server with an Intel Xeon E5-2630 
v4 processor with 256GiB of RAM and employed a 10min timeout per exe- 
cution. We summarize our findings in Fig. 6. Strikingly, our approach already 
performs favourably, despite the differences in architecture, hardly optimized 
advice computation, and no specific re-training for the task at hand. Exclud- 
ing the time spent for advice computation, our approach performs significantly 
better in practically all instances. This answers our second question positively, 
too. 


Adapting SemML to Strix In order to adapt our underlying approach, we require 
several non-trivial changes to SemML. We discuss the “mismatches” between the 
current approach and how they could be addressed. First, Strix selects a glob- 
ally optimal edge to explore while SemML suggest actions locally. In particular, 
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Fig. 6. Scatter plot comparing Strix with guidance provided by SemML and the default 
Trueness. On the left, we depict the total runtime excluding time spent for computing 
the guidance, and on the right we show the total time. We plot all models for which at 
least one method produced a result and count timeouts as 20 min (twice the timeout 
of 10min). Note that the plot is logarithmic. The dashed lines denote a 10x difference. 


our scoring is not trained to compare edges of two different states. While true- 
ness seems to be a good compromise for the time being, we believe that (through 
significant engineering effort) Strix can be modified to accommodate local rec- 
ommendations, or, alternatively, a more sophisticated indicator of a state’s global 
relevance can be learnt. Second, Strix performs two searches, one for the environ- 
ment and one for the system player. However, the parity games we deal with are 
not entirely symmetric — environment always moves first. Thus, we cannot directly 
apply SemML’s ranking to environment states, as they have a different structure. 
Here, we believe that the best solution is to train a separate model for the environ- 
ment (or rather, six further models). Thirdly, Strix only constructs the automa- 
ton explicitly and computes the game implicitly. As such, Strix requests scoring 
information only for edges in the automaton and not in the game. This can be 
addressed by closely integrating the scoring computation with the exploration of 
the automaton — instead of rebuilding the game for each edge (p, q), we can com- 
pute all scores for all outgoing edges of p at once. Finally, as we mentioned, Strix 
by default applies a decomposition approach which builds several sub-automata. 
These also are equipped with semantic labelling, however with a different mean- 
ing — enough to create a significant hurdle for our learning approach. We note 
that Strix actually builds automata by communicating with Owl through a highly 
optimized interface between Java and C++, significantly complicating passing 
information back and forth between the processes. 


7 Conclusion 


We demonstrated that semantic labelling can be exploited for practical gains in 
LTL synthesis. Our experimental evaluation shows that we vastly outperform the 
simple approach of [22], the first step in this direction. Moreover, despite several 
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mismatches, our approach shows promising results for real world applications of 
this idea, i.e. when combined with the state-of-the-art tool Strix. 


Future Work. As discussed above, the main point for future work is a tight, 
tailored integration with Strix. In particular, we want to modify our approach to 
be applicable to the decomposition methods of Strix, modify Strix to consider 
local guidance, and actually learn for the precise task required by Strix. 

Aside from this, we believe that there might be further interesting features 
(hand-crafted or learnt) which could provide us with additional insights. In par- 
ticular, we want to employ automated feature extraction, through more sophis- 
ticated model architectures such as transformers or graph neural networks. 
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Abstract. The difficulty of manually specifying reward functions has 
led to an interest in using linear temporal logic (LTL) to express objec- 
tives for reinforcement learning (RL). However, LTL has the downside 
that it is sensitive to small perturbations in the transition probabilities, 
which prevents probably approximately correct (PAC) learning without 
additional assumptions. Time discounting provides a way of removing 
this sensitivity, while retaining the high expressivity of the logic. We 
study the use of discounted LTL for policy synthesis in Markov decision 
processes with unknown transition probabilities, and show how to reduce 
discounted LTL to discounted-sum reward via a reward machine when 
all discount factors are identical. 


1 Introduction 


Reinforcement learning [39] (RL) is a sampling-based approach to synthesis in 
systems with unknown dynamics where an agent seeks to maximize its accu- 
mulated reward. This reward is typically a real-valued feedback that the agent 
receives on the quality of its behavior at each step. However, designing a reward 
function that captures the user’s intent can be tedious and error prone, and 
misspecified rewards can lead to undesired behavior, called reward hacking [5]. 
Due to the aforementioned difficulty, recent research [8,17,23,31,35] has 
shown interest in utilizing high-level logical specifications, particularly linear 
temporal logic [7] (LTL), to express intent. However, a significant challenge arises 
due to the sensitivity of LTL, similar to other infinite-horizon objectives like aver- 
age reward and safety, to small changes in transition probabilities. Even slight 
modifications in transition probabilities can lead to significant impacts on the 
value, such as enabling previously unreachable states to become reachable. With- 
out additional information on the transition probabilities, such as the minimum 
nonzero transition probability, LTL is proven to be not probably approximately 
correct (PAC) [29] learnable [3,43]. Ideally, it is desirable to maintain PAC learn- 
ability while still keeping the benefits of a highly expressive temporal logic. 
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1 — p2 


Fig. 1. Example showing non-robustness of safety specifications. 


Discounting can serve as a solution to this problem. Typically, discounting 
is used to encode time-sensitive rewards (i.e., a payoff is worth more today than 
tomorrow), but it has a useful secondary effect that payoffs received in the distant 
future have small impact on the accumulated reward today. This insensitivity 
enables PAC learning without requiring any prior knowledge of the transition 
probabilities. In RL, discounted reward is commonly used and has numerous 
associated PAC learning algorithms [29]. 

In this work, we examine the discounted LTL of [2] for policy synthesis 
in Markov decision processes (MDPs) with unknown transition probabilities. 
We refer to such MDPs as “unknown MDPs’” throughout the paper. This logic 
maintains the syntax of LTL, but discounts the temporal operators. Discounted 
LTL gives a quantitative preference to traces that satisfy the objective sooner, 
and those that delay failure as long as possible. The authors of [2] examined 
discounted LTL in the model checking setting. Exploring policy synthesis and 
learnability for discounted LTL specifications is novel to this paper. 

To illustrate how discounting affects learnability, consider the example [32] 
MDP shown in Fig. 1. It consists of a safe state sọ, two sink states s1, s2, and 
two actions a1,a2. Taking action a; in sọ leads to a sink state with probability 
pi and stays in so with probability 1 — p;. Suppose we are interested in learning 
a policy to make sure that the system always stays in the state sọ. Now consider 
two scenarios—one in which pı = 0 and pọ = ô and another in which po = 0 
and pı = 6 where 6 > 0 is a small positive value. In the former case, the optimal 
policy is to always choose a; in sg and in the latter case, we need to choose 
a2 in so. Furthermore, it can be shown that a near-optimal policy in one case 
is not near-optimal in another. However, we cannot select a finite number of 
samples needed to distinguish between the two cases (with high probability) 
without knowledge of ô. In contrast, the time-discounted semantics of the safety 
property evaluates to 1 — A¥ where k is the number of time steps spent in the 
state so. Then, for sufficiently small 6, any policy achieves a high value w.r.t. the 
discounted safety property in both scenarios. In general, small changes to the 
transition probabilities do not have drastic effects on the nature of near-optimal 
policies for discounted interpretations of LTL properties. 


Contributions. Table 1 summarizes results of this paper in the context of known 
results regarding policy synthesis for various classes of specifications. We consider 
three key properties of specifications, namely, (1) whether there is a finite-state 
optimal policy and whether there are known algorithms for (2) computing an opti- 
mal policy when the MDP is known, as well as for (3) learning a near-optimal 
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Table 1. Policy synthesis in MDPs for different classes of specifications. 


Specification Memory Policy Synthesis Algorithm 
Known MDP | PAC Learning 

Reward Machines Finite [24,34] Exists [24,34] | Exists [38] 

LTL Finite [7] Exists [7] Impossible [3, 43] 

Discounted LTL Infinite Open Exists 

Uniformly Discounted LTL | Finite Exists Exists 


policy when the transition probabilities are unknown (without additional assump- 
tions). The classes of specifications include reward machines with discounted-sum 
rewards [24], linear temporal logic (LTL) [7], discounted LTL and a variant of dis- 
counted LTL in which all discount factors are identical, which we call uniformly 
discounted LTL. In this paper, we show the following. 


— In general, finite-memory optimal policies may not exist for discounted LTL 
specifications. 

— There exists a PAC learning algorithm to learn policies for discounted LTL 
specifications. 

— There is a reward machine for any uniformly discounted LTL specification 
such that the discounted-sum rewards capture the semantics of the specifi- 
cation. From this we infer that for any given MDP finite-memory optimal 
policies exist and can be computed. 


Related Work. Linear temporal logic (LTL) is a popular and expressive formalism 
to unambiguously express qualitative safety and progress requirements of Kripke 
structures and MDPs [7]. The standard approach to model check LTL formu- 
las against MDPs is the automata-theoretic approach where the LTL formulas 
are first translated to a class of good-for-MDP automata [20], such as limit- 
deterministic Biichi automata [18,36,37,40], and then, efficient graph-theoretic 
techniques (computing accepting end-component and then maximizing the prob- 
ability to reach states in such components) [13,30,40] over the product of the 
automaton with the MDP can be used to compute optimal satisfaction proba- 
bilities and strategies. Since LTL formulas can be translated into (deterministic) 
automata in doubly exponential time, the probabilistic model checking problem 
is in 2EXPTIME with a matching lower bound [11]. 

Several variants of LTL have been proposed that provide discounted tem- 
poral modalities. De Alfaro et al. [15] proposed an extension of ji-calculus with 
discounting and showed [14] the decidability of model-checking over finite MDPs. 
Mandrali [33] introduced discounting in LTL by taking a discounted sum inter- 
pretation of logic over a trace. Littman et al. [32] proposed geometric LTL as a 
logic to express learning objectives in RL. However, this logic has unclear seman- 
tics for nesting operators. Discounted LTL was proposed by Almagor, Boker, and 
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Kupferman [2], which considers discounting without accumulation. The decid- 
ability of the policy synthesis problem for discounted LTL against MDPs is an 
open problem. 

An alternative approach to discounting that ensuring PAC learnability is to 
introduce a fixed time horizon, along with a temporal logic for finite traces. In 
this setting, the logic LTL; is the most popular [10,16]. Using LTLş with a 
finite horizon yields simple algorithms [41], finite automata suffice for checking 
properties, but at the expense of the expressivity of the logic, formulas like GFp 
and FGp both mean that p occurs at the end of the trace. 

There has been a lot of recent work on reinforcement learning from temporal 
specifications [1,9,16,19,21,22,24-28,31,32,42,44]. Such approaches often lack 
strong convergence guarantees. Some methods have been developed to reduce 
LTL properties to discounted-sum rewards [8,19] while preserving optimal poli- 
cies; however they rely on the knowledge of certain parameters that depend 
on the transition probabilities of the unknown MDP. Recent work [3,32,43] has 
shown that PAC algorithms that do not depend on the transition probabilities do 
not exist for the class of LTL specifications. There has also been work on learn- 
ing algorithms for LTL specifications that provide guarantees when additional 
information about the MDP (e.g., the smallest nonzero transition probability) 
is available [6, 12,17]. 


2 Problem Definition 


An alphabet X is a finite set of letters. A finite word (resp. w-word) over X is 
defined as a finite sequence (resp. w-sequence) of letters from X. We write X* 
and »” for the set of finite and w-words over X. 

A probability distribution over a finite set S is a function d: S—[0,1] such 
that X ses d(s) = 1. Let D(S) denote the set of all discrete distributions over S. 


Markov Decision Processes. A Markov Decision Process (MDP) is a tuple M = 
(S, A, so, P), where S is a finite set of states, sọ is the initial state, A is a finite 
set of actions, and P : S x A — D(S) is the transition probability function. An 
infinite run y € (Sx A)” is a sequence Y = soaos141 ..., where s; E Sanda; E A 
for all i € Z>o. For any run % and any i < j, we let Yi; denote the subsequence 
SiūiSi+1đi+1 .-.@;—-18;. Similarly, a finite run h € (Sx A)* x S is a finite sequence 
h = 89a981a, ...a¢_-15,. We use Z(S, A) = (S x A)” and Z;(S, A) = (Sx A)*xS 
to denote the set of infinite and finite runs, respectively. 

A policy m : Z;(S,A) — D(A) maps a finite run h € Z;(S,A) to a distri- 
bution z(h) over actions. We denote by JT(S, A) the set of all such policies. A 
policy m is deterministic if, for all finite runs h € Z,(S,A), there is an action 
a € A with m(h)(a) = 1. 

Given a finite run h = soag...az-181, the cylinder of h, denoted by Cyl(h), 
is the set of all infinite runs with prefix h. Given an MDP M and a policy 
m € II(S,A), we define the probability of the cylinder set by DM (Cyl(h)) = 
I T(ho:i)(ai)P (Si, ai, Si+1). It is known that DM can be uniquely extended 
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to a probability measure over the o-algebra generated by all cylinder sets. Let P 
be a finite set of atomic propositions and X = 2? denote the set of all valuations 
of propositions in P. An infinite word p € XY” is a map p : Zso9 > X. 


Definition 1 (Discounted LTL). Given a set of atomic propositions P, dis- 
counted LTL formulas over P are given by the grammar 
g:=bEP| wl eve| Xe] ¢ Ure 
where A € [0,1). Note that, in general, different temporal operators within the 
same formula may have different discount factors A. For a formula p and a word 
p = 0001... € (2”)”, the semantics [y, p] € [0,1] is given by 
lb, p] = 1(b € oo) 
[-¥, e] = 1- [¢, el 
[y1 V 92, p] = max {[¢1, Pl; [¢2, Pl} 
[Kay p] = A: De, p1] 


= in S Aifos. o; in Alo. p; 
[p:Ux¢2, p] = sup {min {2 [$21 Pico], min {A [er pixDd}S 


Where Pi:oo = TiFi41--. denotes the infinite word starting at position i. 


Conjunction is defined using y1 A ye = 7(7¢1 V 7y2). We use Fyy = trueUyy 
and Gay = =F) -¥ to denote the discounted versions of finally and globally 
operators respectively. Note that when all discount factors equal 1, the semantics 
corresponds to the usual semantics of LTL. 

For this paper, we consider the case of strict discounting, where A < 1. We 
refer to the case where the discount factor is the same for all temporal operators 
as uniform discounting. Our definition differs from [2] in two ways: 1) we discount 
the next operator, and 2) we enforce strict, exponential discounting. 


Example Discounted LTL Specifications. To develop an intuition of the semantics 
of discounted LTL, we now present a few example formulas and their meaning. 


— F) p obtains a value of A” where n is the first index where p becomes true 
in a trace, and 0 if p is never true. An optimal policy attempts to reach a 
p-state as soon as possible. 

— G, p obtains a value of 1 — A” where n is the first index that a ~p occurs in a 
trace, and 1 if p always holds. An optimal policy attempts to delay reaching 
a —p-state as long as possible. 

— X) p obtains a value of A if p is in the second position and 0 otherwise. 

— pV X, q obtains a value of 1 if p is in the first position of the trace, a value 
of À if the trace begins with —p followed by q, and a value of 0 otherwise. 

— F, pA G) q evaluates to the minimum of à” and (1—A™), where n is the first 
position where p becomes true in a trace and m is the first position where 
q becomes false. If n* = log)0.5 is the index where these two competing 
objectives coincide, then the optimal policy attempts to stay within q-states 
for the first n* steps and then attempts to reach a p-state as soon as possible. 
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— Consider the formula F),G),p. Given a trace, consider a p-block of length m 
starting at position n, that is, p holds at all positions from n to n+m—1, 
and does not hold at position n — 1 (or n is the initial position). The value 
of such a block is Aj(1 — A$"). The value of the trace is then the maximum 
over values of all such p-blocks. The optimal policy attempts to have as long 
a p-block as possible as early as possible. The discount factor A indicates the 
preference for the p-block to occur sooner and the discount factor Az indicates 
the preference for the p-block to be longer. 

— G),F),p obtains a value equivalent to =F), Ga, `p. Traces which contain 
more p’s at shorter intervals are preferred. The discount factor A, indicates 
the preference for the total number of p’s to be larger and A2 indicates the 
preference for the interval between the consecutive p’s to be shorter. 


Policy Synthesis Problem. Given an MDP M = (S, A, so, P), we assume that we 
have access to a labelling function L: S — X that maps each state to the set of 
propositions that hold true in that state. Given any run W = spagS1a,... we can 
define an infinite word L(y) = L(so)L(s1)... that denotes the corresponding 
sequence of labels. Given a policy a for M, we define the value of m with respect 
to a discounted LTL formula y as 


IM (m, p) = „Endee (1) 


and the optimal value for M with respect to y as J*(M, p) = sup, J“ (r, p). 
We say that a policy 7 is optimal for y if 7 (7, p) = J*(M, p). Let Hopt( M, p) 
denote the set of optimal policies. Given an MDP M, a labelling function L and 
a discounted LTL formula y, the policy synthesis problem is to compute an 
optimal policy m € Hopt( M, p) when one exists. 


Reinforcement Learning Problem. In reinforcement learning, the transition prob- 
abilities P are unknown. Therefore, we need to interact with the environment to 
learn a policy for a given specification. In this case, it is sufficient to learn an €- 
optimal policy 7 that satisfies 7“ (r, p) > J*(M,)—e. We use IT5,,(M, p) to 
denote the set of ¢-optimal policies. Formally, a learning algorithm A is an iter- 
ative process which, in every iteration n, (i) takes a step in M from the current 
state, (ii) outputs a policy 7, and (iii) optionally resets the current state to so. 
We are interested in probably-approximately correct (PAC) learning algorithms. 


Definition 2 (PAC-MDP). A learning algorithm A is said to be PAC-MDP 
for a class of specifications C if, there is a function ņn such that for any p > 0, 
€ > 0, MDP M = (S, A, so, P), labelling function L, and specification p € C, 
taking N = n(|S|, |A|, |l, a 1), with probability at least 1 — p, we have 


{n |an E Hap M, o) }| < N: 


It has been shown that there does not exist PAC-MDP algorithms for LTL 
specifications. Therefore, we are interested in the class of discounted LTL spec- 
ifications that are strictly discounted, i.e. A < 1 for every temporal operator. 


Policy Synthesis and Reinforcement Learning for Discounted LTL 421 


3 Properties of Discounted LTL 


In this section, we discuss important properties of discounted LTL regarding 
the nature of optimal policies. We first show that, under uniform discounting, 
the amount of memory required for the optimal policy may increase with the 
discount factor. We then show that, in general, allowing multiple discount factors 
may result in optimal policies requiring infinite memory. This motivates our 
restriction to the uniform discounting case in Sect.4. We end this section by 
introducing a PAC learning algorithm for discounted LTL. 


3.1 Nature of Optimal Policies 


It is known that for any (undiscounted) LTL formula p and any MDP M, 
there exists a finite memory policy that is optimal—i.e., the policy stores only 
a finite amount of information about the history. Formally, given an MDP M = 
(S, A, so, P), a finite memory policy m = (M, ôm, H, Mmo) consists of a finite set 
of memory states M, a transition function ôm : Mx S x A — M and an 
action function u : M x S — D(A). Given a finite run h = soap... s = hst, 
the policy’s action is sampled from u(ôm(mMo, h’), s+) where m is also used to 
represent the transition function extended to sequences of state-action pairs. We 
use II(S,A) to denote the set of finite memory policies. In this paper, we will 
show that uniformly discounted LTL admits finite memory optimal policies, but 
that infinite memory may be required for the general case. 

Unlike (undiscounted) LTL, discounted LTL allows a notion of satisfaction 
quality. In discounted LTL, traces which satisfy a reachability objective sooner 
are given a higher value, and are thus preferred. If an LTL formula cannot be 
satisfied, the corresponding discounted LTL formula will assign higher values to 
traces which delay failure as long as possible. These properties of discounted LTL 
are desirable for enabling notions of promptness, but may yield more complex 
strategies which try to balance the values of multiple competing subformulas. 


Example 1. Consider the discounted LTL formula y = G)pAF ~p. This formula 
contains two competing objectives that cannot both be completely satisfied. 
Increasing the value of Gyp by increasing the number of p’s at the beginning of 
the trace before the first ~p decreases the value of F,-p. Under the semantics of 
conjunction, the value of ọ is the minimum of the two subformulas. Specifically, 
the value of y w.r.t. a word p is 


[Gap A Fy79, p] = [Enp A Ep, p] 
= (Fap V =F 7p), P] 
=]— max{[F-p, pl, [=F 7p, pl} 


= 1 — max {sup(a'bv, Pi:oo] }, 1 — sup{A" Fp, pixel} } s 
i>0 i>0 


where pi:œ is the trace starting from index i. Now consider a two state (deter- 
ministic) MDP with two states S = {s1, s2} and two actions A = {a1,a2} in 
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which the agent can decide to either stay in sı or move to s2 at any step and 
the system stays in s2 upon reaching s2. This MDP can be seen in Fig. 2. We 
have one proposition p which holds in state sı and not in s2. Note that all runs 
produced by the example MDP are either of the form s¥ or s/s%. The discounted 
LTL value of runs of the form s¥ is 0. The value of runs of the form w = s/s¥ is 


v(k) = lp, L(w)] =1- max{)*, 1- AF} : 


A finite memory policy stays in sı for k steps will yield this value. Since A% is 
decreasing in k and 1—)? is increasing in k, the integer value of k that maximizes 
v(k) lies in the interval [y — 1, y + 1] where y € R satisfies AY = 1 — X7. Figure 2 
shows this graphically. We have that y = nso 5) 


the amount of memory required increases with increase in À. 


which is increasing in A. Hence, 


1 
m a 0.8 
p 2 0.6 
S 04 | 
a2 | 
0.2 ! : 
a1,42 0 - 
0 y 20 400 


Fig. 2. An example showing that memory requirements for optimal policies may depend 
on the discount factor. The red line is A, the blue line is 1 — à” and the solid black 
line is v(k) = 1 — max{ à”, 1 — à” }, where k is the number of time steps one remains in 
so. The dashed vertical line shows the value y where v(k) is maximized. We have set 
à = 0.99. Note that changing the value of A corresponds to rescaling the x-axis. (Color 
figure online) 


The optimal strategy in the example above tries to balance the value of two 
competing subformula. We will now show that extending this idea to the general 
case of multiple discount factors requires balancing quantities that are decaying 
at different speeds. This balancing may require remembering an arbitrarily long 
history of the trace—infinite memory is required. 


Theorem 1. There exists an MDP M = (S, A,so, P), a labelling function 
L and a discounted LTL formula p such that for all n € Hp(S,A) we have 
IM (m, p) < I*(M, p). 


Proof. Consider the MDP M depicted in Fig.3. It consists of three states 
S = {s0,51,S2} and two actions A = {a1,a2}. The edges are labelled with 
actions and the corresponding transition probabilities. There are two proposi- 
tions P = {pi,p2} and pı holds true in state sı and pz holds true in state so. 
The specification is given by y = F), Gy, pi A Fap p2 where A1 < Ag < 1. 
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at a1, a2 


Fig. 3. The need for infinite memory for achieving optimality in discounted LTL. 


For any run Ņ that never visits s2, we have [y, LD] = 0 since 
[F> p2, L(w)] = 0. Otherwise the run has the form y = s5°s;!s¥ where ko 
is stochastic and k; is a strategic choice by the agent. To show that this requires 
an infinite amount of memory to play optimally, one just has to show that the 
optimal choice of kı increases with kọ. This means that the agent must remem- 
ber ko, the number of steps spent in the initial state, via an unbounded counter. 
Note that every value of ko has a non-zero probability in M and therefore choos- 
ing a suboptimal kı for even a single value of ko causes a decrease in value from 
the policy that always chooses optimal kı. 

The value of the run 4 is [y, L(Y)] = min(A¥(1 — AB), A¥0+t*1), Note that 
AF0 (1 — A$") increases with increase in kı and AS°**! decreases with increase in 
kı. Therefore taking y € R to be such that \¥°(1 — AZ) = AS°+7, the optimal 
choice of kı lies in the interval [y—1, y+1]. Now y satisfies 1 = ((A2/A1)*° +1) A. 
Since Ay < Ag < 1 we must have that y increases with increase in ko. Therefore, 
kı also increases with increase in ko. 


3.2 PAC Learning 


In the above discussion, we showed that one might need infinite memory to act 
optimally w.r.t a discounted LTL formula. However, it can be shown that for any 
MDP M, labelling function L, discounted LTL formula vy and any e€ > 0, there 
is a finite-memory policy m that is c-optimal for y. In fact, we can show that 
this class of discounted LTL formulas admit a PAC-MDP learning algorithm. 


Theorem 2 (Existence of PAC-MDP). There exists a PAC-MDP learning 
algorithm for discounted LTL specifications. 


Proof (sketch). Our approach to compute ¢-optimal policies for discounted LTL 
is to compute a policy which is optimal for T steps. The policy will depend on 
the entire history of atomic propositions that has occured so far. 

Given discounted LTL specification y, the first step of the algorithm is to 
determine T. We select T such that for any two infinite words a and ( where the 
first T +1 indices match, i.e. ao:r = bo:r, we have that Iv, a] -[¢, fell < e. Say 
that the maximum discount factor appearing in all temporal operators is Amax- 
Due to the strict discounting of discounted LTL, selecting T > —22< ensures 


log Amax 
that |[y, a] — lv, 6]| <A" < e. 
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Now we unroll the MDP for T steps. We include the history of the atomic 
proposition sequence in the state. Given an MDP M = (S, A, so, P) and a label- 
ing L: S > X, the unrolled MDP Mr = (S’, A’, so, P’) is such that 


t times 


A’ = A, P'((s,00,.--; 04-1), 4, (8',00,---,0+-1,04)) = P(s,a,s') ffO<t<T 
and o; = L(s’), and is 0 otherwise (the MDP goes to a sink state if t > T). The 
leaves of the unrolled MDP are the states where T timesteps have elapsed. In 
these states, there is an associated finite word of length T. For a finite word of 
length T, we define the value of any formula y to be zero beyond the end of the 
trace, i.e. [Y, pj: = 0 for any j > T. We then compute the value of the finite 
words associated with the leaves which is then considered as the reward at the 
final step. We can use existing PAC algorithms to compute an ¢-optimal policy 
w.r.t. this reward for the finite horizon MDP Mr from which we can obtain a 
2e-optimal policy for M w.r.t the specification yp. 


4 Uniformly Discounted LTL to Reward Machines 


In general, optimal strategies for discounted LTL require infinite memory (Theo- 
rem 1). However, producing such an example required the use of multiple, varied 
discount factors. In this section, we will show that finite memory is sufficient 
for optimal policies under uniform discounting, where the discount factors for 
all temporal operators in the formula are the same. We will also provide an 
algorithm for computing these strategies. 

Our approach is to reduce uniformly discounted LTL formulas to reward 
machines, which are finite state machines in which each transition is associated 
with a reward. We show that the value of a given discounted LTL formula y for 
an infinite word p is the discounted-sum reward computed by a corresponding 
reward machine. 

Formally, a reward machine is a tuple R = (Q, ô, r, qo, A) where Q is a finite 
set of states, ô : Q x X — Q is the transition function, r : Q x X — R is 
the reward function, go € Q is the initial state, and A € [0,1) is the discount 
factor. With any infinite word p = 090)... € X“, we can associate a sequence 
of rewards coc... where c, = r(qe, o+) with qe = 6(q:—1, 04-1) for t > 0. We use 
R(p) to denote the discounted reward achieved by p, 


R(p) = 5 cr, 
t=0 


and R(w) to denotes the partial discounted reward achieved by the finite word 
wW=0001...07 E S*—i.e., R(w) = sem Atci where c is the reward at time t. 

Given a reward machine R and an MDP M, our objective is to maximize 
the expected value R(p) from the reward machine reading the word p produced 
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by the MDP. Specifically, the value for a policy m for M is 


[R(p)| 


I™ (T, R) = 


p~DM 


where 7 is optimal if J/“(z,R) = sup, J“ (T, R). Finding such an optimal 
policy is straightforward: we consider the product of the reward machine R with 
the MDP M to form a product MDP with a discounted reward objective. In 
the corresponding product MDP, we can compute optimal policies for maxi- 
mizing the expected discounted-sum reward using standard techniques such as 
policy iteration and linear programming. If the transition function of the MDP 
is unknown, this product can be formed on-the-fly and any RL algorithm for 
discounted reward can be applied. Using the state space of the reward machine 
as memory, we can then obtain a finite-memory policy that is optimal for R. 

We have the following theorem showing that we can construct a reward 
machine Ry for every uniformly discounted LTL formula g. 


Theorem 3. For any uniformly discounted LTL formula y, in which all tempo- 
ral operators use a common discount factor A, we can construct a reward machine 
Ryo = (Q, ô, r, qo, A) such that for any p E€ X”, we have Ry(p) = [p, ¢]. 


We provide the reward machine construction for Theorem 3 in the next sub- 
section. Using this theorem, one can use a reward machine Rẹ that matches 
the value of a particular uniformly discounted LTL formula y, and then apply 
the procedure outlined above for computing optimal finite-memory policies for 
reward machines. 


Corollary 1. For any MDP M, labelling function L and a discounted LTL 
formula p in which all temporal operators use a common discount factor A, 
there exists a finite-memory optimal policy 7 € Hop M, p). Furthermore, there 
is an algorithm to compute such a policy. 


4.1 Reward Machine Construction 


For our construction, we examine the case of uniformly discounted LTL formula 
with positive discount factors A € (0,1). This allows us to divide by A in our 
construction. We note that the case of uniformly discounted LTL formula with 
A = 0 can be evaluated after reading the initial letter of the word, and thus have 
trivial reward machines. 

The reward machine Rẹ constructed for the uniformly discounted LTL for- 
mula y exhibits a special structure. Specifically, all edges within any given 
strongly-connected component (SCC) of R, share the same reward, which is 
either 0 or 1 — A, while all other rewards fall within the range of [0,1 — à]. We 
present an inductive construction of the reward machines over the syntax of 
discounted LTL that maintains these invariants. 


Lemma 1. For any uniformly discounted LTL formula ọ there exists a reward 


machine Ry = (Q, ô, r, qo, A) such that following hold: 
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Tia 


Fig. 4. Reward machines for y = p (left) and y = Xyq (right). The transitions are 
labeled by the guard and reward. 


IL. For any p E€ X”, we have Ry(p) = [p, ol- 
Iz. There is a partition of the states Q = Cyan Qe and a type mapping x : |L] > 
{0,1 —A} such that for any q E€ Qe ando € D, 
(a) 5(q,0) € Um=e Qm, and 
(b) if 5(q,0) € Qe then r(q, 0) = x(4). 
Iz. For anyq E€ Q anda € X, we have 0 < r(q,0) < 1— À. 


Our construction proceeds inductively. We define the reward machine for the 
base case of a single atomic proposition, i.e. p = p, and then the construction 
for negation, the next operator, disjunction, the eventually operator (for ease of 
presentation), and the until operator. The ideas used in the constructions for dis- 
junction, the eventually operator, and the until operator build off of each other, 
as they all involve keeping track of the maximum/minimum value over a set of 
subformulas. We use properties J; and Iz to show correctness, and properties [5 
and I3 to show finiteness. A summary of the construction and detailed proofs 
can be found in the full version of this paper [4]. 


Atomic Propositions. Let y = p for some p € P. The reward machine Ry = 
(Q, 6,7, qo, A) for y is such that Q = {qo, q1, q2} and (q, o) = q for all q € {q, qo} 
and ø € X. The reward machine is shown in Fig. 4 where edges are labelled with 
propositions and rewards. If p € o, ô(qo, o) = q and r(qo,0) =1-— à. If p ¢ oa, 
6(qo,0) = q2 and r(qo, 0) = 0. Finally, r(qi,o) = 1 — A and r(q,o) = 0 for all 
ao € X. It is clear to see that I4, I2, and Ig hold. 


Negation. Let y = ~g for some LTL formula y and let Ry, = (Q, ô, r, qo, A) 
be the reward machine for y1. Notice that the reward machine for y can be 
constructed from Ry, by simply replacing every reward c with (1 — A) — c 
as S77, A(1 — A) = 1. Formally, Ry = (Q,6,r’,q0,A) where r’(g,0) = 
(1 — A) — r(q,c) for all q E€ Q and o € X. Again, assuming that invariants 
L, Iz, and I3 hold for Ry,, it easily follows that they hold for Ry. 

Next Operator. Let y = X)y for some yı and let Ry, = (Q, ô, r, qo, A) be 
the reward machine for y1. The reward machine for y can be constructed from 
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Ryo, by adding a new initial state qj and a transition in the first step from it 
to the initial state of R,,. From the next step Ry simulates R,,. This has the 
resulting effect of skipping the first letter, and decreasing the value by A. For- 
mally, Ry = ({q}UQ, 8, 1’, q0, A) where ô’ (q0, o) = qo and 6’(q,0) = ô(q, o) for 
all q E€ Q and o € X. Similarly, r’(qj,7) = 0 and r’(q,c) = r(q,0) for all q E Q 
and ø € X. Assuming that invariants J1, I2, and J3 hold for Ry,, it follows that 
they hold for Ry. 


Disjunction. Let y = 1 V p2 for some y1, p2 and let Ry, = (Q1, 61,71, q8, À) 
and Ry, = (Q2, 62,72, q, A) be the reward machines for yı and g2, respectively. 
The reward machine Ry = (Q, ô, r, go, A) is constructed R,, and Ry, such that 
for any finite word it maintains the invariant that the discounted reward is 
the maximum of the reward provided by Ry, and Ry,. Moreover, once it is 
ascertained that the reward provided by one machine cannot be overtaken by 
the other for any suffix, R, begins simulating the reward machine with higher 
reward. 

The construction involves a product construction along with a real-valued 
component that stores a scaled difference between the total accumulated reward 
for yı and y2. In particular, Q = (Qi x Q2 x R) U Q1 U Q2 and qo = (qb, 4,0). 
The reward deficit ¢ of a state q = (q1,q2,¢) denotes the difference between 
the total accumulated reward for yı and 2 divided by A” where n is the total 
number of steps taken to reach q. The reward function is defined as follows. 


— For q = (1, 42,6), we let f(q,o) = rı(q1, o) — r2(q2,0) + ¢ denote the new 
(scaled) difference between the discounted-sum rewards accumulated by Ry, 
and Ros. The current reward depends on whether f(q, 7) is positive (accumu- 
lated reward from Ry, is higher) or negative and whether the sign is different 
from ¢. Formally, 


qoa) = {7uG9) + min{O,C} if (qa) 20 
7 T (rala o) -= max{0,0} if flao) <0 


— For a state qi € Q; we have r(qi, o) = rilqi, o). 


Now we need to make sure that ¢ is updated correctly. We also want the transi- 
tion function to be such that the (reachable) state space is finite and the reward 
machine satisfies J1, I2 and T3. 


— First, we make sure that, when the difference ¢ is too large, the machine 
transitions to the appropriate state in Qı or Q2. For a state q = (q1, q2, Å) 
with |¢| > 1, we have 


— 61(q1, 0) if¢>1 
Cae ee oe 


— For states with |¢| < 1, we simply advance both the states and update ¢ 
accordingly. Letting f(q,o) = rı(qı, o) — r2(q2,0) + Ç, we have that for a 
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state q= (q1, q2, Ç) with I¢| < 1, 
5(q, o) = (51 (1,0), 62(q2, o), f(a, o)/A). (2) 
~ Finally, for q; € Qi, 6(gi, o) = 9:(gi,). 


Finiteness. We argue that the (reachable) state space of Rẹ is finite. Let Q; = 
UŽ Qi, for i € {1,2} be the SCC decompositions of Qı and Q2 that satisfy 
property Iz for Ry, and Ry, respectively. Intuitively, if Rọ stays within Q} x 
Q?, x R for some £ < Lı and m < Lz, then the rewards from Ry, and Ry, 
are constant; this enables us to infer the reward machine (Rọ, and R,,) with 
the higher total accumulated reward in a finite amount of time after which we 
transition to Qı or Q2. Hence the set of all possible values of ¢ in a reachable 
state (q1, q2, Ç) € Q} x Q2, x R is finite. This can be shown by induction. 


Property I. Intuitively, it suffices to show that Ro (w) = max{Ry, (w), Ry, (w)} 
for every finite word w € X*. We show this property along with the fact that for 
any w € X* of length n, if the reward machine reaches a state (q1, q2, Ç), then 


C = (Ry, (w) — Ry, (w))/r”. This can be proved using induction on n. 


Property Iz. This property is true if and only if for every SCC C of Rọ there is a 
type c € {0,1—A} such that if (q, o) = q’ for some g,q’ € C and o € X, we have 
r(q,o0) = c. From the definition of the transition function 6, C cannot contain 
two states where one is of the form (q1,q2,¢) E Qi x Q2 x R and the other is 
qi E Qi for some i € {1,2}. Now if C is completely contained in Q; for some 
i, we can conclude from the inductive hypothesis that the rewards within C are 
constant (and they are all either 0 or 1 — A). When all states of C are contained 
in Q1 x Q2 x R, they must be contained in Q1 x Q2 x R where Q; is some SCC 
of Rọ,- In such a case, we can show that |C| = 1 and in the presence of a self 
loop on a state within C, the reward must be either 0 or 1 — X. 


Property I3. We now show that all rewards are bounded between 0 and (1 — A). 
Let q = (q1,¢2,¢) and f(q,c) = rı(qı, o) — r2(q2,0) + ¢. We show the bound 
for the case when f(q,o) > 0 and the other case is similar. If ¢ > 0, then 
r(q,o) = rı(qı, a) € [0,1 — A]. If ¢ < 0, then r(q,0) < ri(m,o) < 1 — à and 


r(q, o) = r1(%1,¢) ae ¢ = f(a) ey r2(q2,0) 2 0. 
This concludes the construction for p1 V p2. 


Eventually Operator. For ease of presentation, we treat the until operator as 
a generalization of the eventually operator F and present it first. We have that 
p = Fy, for some y1. Let Ry, = (Q1,61,71,95,A) be the reward machine for 
pı. Let Xi denote the operator X applied i times. We begin by noting that 


F\y,; = VV Xiv = 1 VX ayı V Xf V rets 
i>0 
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The idea of the construction is to keep track of the unrolling of this formula up 
to the current timestep n, 


Fñ yi = V Xi yı = 1 V XrY1 V X$ yı MVE eat M Xg. 
n>i>0 


For this, we will generalize the construction for disjunction. In the disjunction 
construction, there were states of the form (q1, q2, Ç) where ¢ was a bookkeeping 
parameter that kept track of the difference between Ry, (w) and Ry, (w), namely, 
C = (Ry, (w)-Ry,(w))/A” where w € X* is some finite word of length n. To gen- 
eralize this notion to make a reward machine for max{R1,..., Rx}, we will have 
states of the form {(q1, Q1), ---, (dn, Cn) } where ¢; = (Ri(w) — max; R;(w))/r”. 
When ¢; < —1 then R;(w) +A” < max, R;(w) and we know that the associated 
reward machine R; cannot be the maximum, so we drop it from our set. We also 
note that the value of Xi yı can be determined by simply waiting i steps before 
starting the reward machine Ry,, ie. ARo (Pi:oo) = Rx», (p). This allows us 
to perform a subset construction for this operator. 

For a finite word w = 00901...¢, E X* and a nonnegative integer k, let 
Wk:oo denote the subword o,...0n which equals the empty word e if k>n. 
We use the notation [X¥y1,w] = A*Ry, (Wk:o) and define [F* yi, w] 
max,>i>0 [Xk y1, w] which represents the maximum value accumulated by the 
reward machine of some formula of the form Xi pı with 7 < k on a finite word 
w. The reward machine for Fy, will consist of states of the form (v, S), con- 
taining a value v for bookkeeping and a set S that keeps track of the states of 
all Rxi y, that may still obtain the maximum given a finite prefix w of length 
n, i.e. reward machine states of all subformulas Xi pı for n > i > 0 that satisfy 
[Xip w] +A” > [FX ¢1, w] since A” is the maximum additional reward obtain- 
able by any p € ©” with prefix w. The subset S' consists of elements of the form 
(di, Ĝi) E S where qi = 61(49, Wi:oo) and G = ([X5 y1, w] — [FX¢i, w])/A” corre- 
sponding to each subformula X41. The value v = max{—1, —[F%y1, w]/A”} is 
a bookkeeping parameter used to initialize new elements in the set S and to stop 
adding elements to S when v < —1. We now present the construction formally. 

We form a reward machine Ry = (Q, ô, r, qo, 4) where Q = R x 2@1*® and 
qo = (0, {(q,0)}). We define a few functions that ease defining our transition 


function. Let f(¢,¢,0) = rı(q, o) + ¢ and m(S, o) = ; D f (Ci, qi, o). For the 
GG K3 
subset construction, we define 


= U 8a 0), ): C = (Fa) - m(S,0))/A) > -1} 
(q,g)ES 
The transition function is 
(v'(5,v,0), A(S,o)U (a3, 0'(S,v,0))) if v'(S,v,o) > —1 
(—1, A(S,o)) if v'(S,v,o) < —1 


where v'(S,v,o) = (v — m(S,o))/A. The reward function is r((v,S),o) = 
m(S,c). 


d((v, S'),o) = 
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We now argue that Rẹ satisfies properties T1, I2 and I3 and the set of reach- 
able states in Ry is finite assuming Ry, satisfies 1, I2 and Is. 


Finiteness. Consider states of the form (v, S) € Q. If v = 0, then it must be that 
¢; = 0 for all (q;,¢;) € S since receiving a non-zero reward causes the value of 
v to become negative. There are only finitely many such states. If —1 < v < 0, 
then we will reach a state (v’,S’) € Q with v’ = —1 in at most n steps, where 
n is such that v/A" < —1. Therefore, the number of reachable states (v, S) with 
—1 < v < 0 is also finite. Also, the number of states of the form (—1, S) that can 
be initially reached (via paths consisting only of states of the form (v,$”) with 
v > —1) is finite. Furthermore, upon reaching such a state (—1, S), the reward 
machine is similar to that of a disjunction (maximum) of |S| reward machines. 
From this we can conclude that the full reachable state space is finite. 


Property lı. The transition function is designed so that the following holds true: 
for any finite word w € X* of length n and letter o € X, if d(qo,w) = (v, S), 
then m(S,o) = ((F2*"y1, wo] — [FR¢1, w])/r”. Since r((v, S),0) = m(S,o), 
we get that R,(w) = [FXy¢i,w]. Thus, R,(e) = [F¢1,p] for any infinite 
word p € X”. This property for m(S,c) follows from the preservation of all the 
properties outlined in the above description of the construction. 


Property Iz. Consider an SCC C in Ry such that (v, S) = 6((v,S), w) for some 
(v,S) € C and w € X* of length n > 0. Note that if —1 < v < 0, then 
(v’, S”) = 6((v, S), w) is such that v’ < v. Thus, it must be that v = 0 or v = —1. 
If v = 0, then all the reward must be zero, since any nonzero rewards result in 
v <0. If v = —1, then it must be that for any (qi, Ci) € S, qi is in an SCC Ci in 
Ry, with some reward type c; € {0,1 — A}. For all ¢; to remain fixed (which is 
necessary as otherwise some Q; strictly increases or decreases), it must be that 
all c; are the same, say c. Thus, the reward type in Rọ, for SCC C equals c. 


Property I3. We can show that for any finite word w € X* of length n and 
letter o € X, if d(qo,w) = (v, S), then the reward is r((v, S),0) = m(S,c) = 
([F3 t 91, wo] — [FR¢1, w])/à” using induction on n. Since property Ta holds 
for Ry,, we have that 0 < ([F3**y, wo] — [F¢1, w]) < (1 — A). 

Until Operator. We now present the until operator, generalizing the ideas 
presented for the eventually operator. We have that y = y1U)¢2 for some y1 
and p2. Let Rei = (Qı, 51,71, qh, A) and Res = (Q2, 52,72, q8, A). Note that 


piUyye = VAr AQ1A X91 Ndi ed Xi 191) 
i>0 
= p2 V (Kaya A p1) V (XA p2 A p1 A Kavi) Vv... 


The goal of the construction is to keep track of the unrolling of this formula up 
to the current timestep n, 


pU = V (KiveaA gi AXA. AXT) = V ve 


n>i>0 n>i>0 
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Each yw; requires a subset construction in the style of the eventually opera- 
tor construction to maintain the minimum. We then nest another subset con- 
struction in the style of the eventually operator construction to maintain the 
maximum over w;. For a finite word w € X*, we use the notation [w;, w] and 
[vi Uk p2, w] for the value accumulated by reward machine corresponding to 
these formula on the word w, ie. [vj, w] = min{[.X} vol), mins ;>0{[Xi¢1, w]} 
and [~1U§ yo, w] = maxk>i>o [Y w]. 

Let S = 2(214@2)xR be the set of subsets containing (q,¢) pairs, where q 
may be from either Qı or Q2. The reward machine consists of states of the 
form (v, I, X) where the value v € R and the subset I € S are for bookkeeping, 
and ¥ € 2° is a subset of subsets for each 7;. Specifically, each element of 
X is a subset S corresponding to a particular p; which may still obtain the 
maximum, i.e. [y:;, w] + A” > [yi UR¢e2, w]. Each element of S is of the form 
(q,¢). We have that q € Q2 for at most one element where q = ô2(qf, Wk:oo) 
and Ç = ([X%yo, w] — [ei US ye, w])/X”. For the other elements of S, we have 
that q € Qi with q = 61(q4, We:oo) and ¢ = ([X¥¢1, w] — [ei UR y2, wJ) /A”. 
If for any of these elements, the value of its corresponding formula becomes 
too large to be the minimum for the conjunction forming y;, ie. [W;, w] + A" < 
[v1 UR ye, w]+A” < [Xk pi, w] which occurs when ¢ > 1, that element is dropped 
from S. In order to update X, we add a new S corresponding to pn on the next 
timestep. The value v = max{—1, [pı U% y2, w|} is a bookkeeping parameter for 
initializing new elements in the subsets and for stopping the addition of new 
elements when v < —1. The subset I is a bookkeeping parameter that keeps 
track of the subset construction for /\,,.;s0 X41, which is used to initialize the 
addition of a subset corresponding to Yn = KXy2 A (Ansiso Xi y1). We now 
define the reward machine formally. 

We define a few functions that ease defining our transition function. We define 
6.(q,0) = ôilq, o) and f.(¢,¢,0) = ri(q,o) + ¢ if q E Q; for i € {1,2}. We also 
define n(S,o) = ming, tjes fx(G,Gi,7) and m(X, o) = maxsex n(S, o). For the 
subset construction, we define 


A(S,a,m) = U {8 Ga) ge 


(q,g)Es 
where ¢' = (fe(¢,q,0) — m)/A and 


T(X,0,m) = (J {A(S,o,m) : n(S,0) > -1}. 
SEX 


We form a reward machine Ry = (Q, ô, r, qo, à) where Q = R x S x 2° and 
qo = (0,0, {{(q%, 0)}}). The transition function is 


Z I', T(X,o,m)uU (F u (a3, "))) ifv > -1 


Konaho = | 0, T(X,0,m)) if v’ < —1 


where m = m(&, 0), v! =(v—m)/A, and I’ = A(T U (q4, v'), o, m). The reward 
function is r((v, I, ¥), o) = m(¥, 0). 
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We now show a sketch of correctness, which mimics the proof for the even- 
tually operator closely. 


Finiteness. Consider states of the form (v, I, ¥) € Q. If v = 0, then for all S € ¥ 
and (qi,¢;) E S it must be that ¢; = 0 since receiving a non-zero reward causes 
the value of v to become negative. Similarly, all ¢; = 0 for (qi, Ci) € I when v = 0. 
There are only finitely many such states. If —1 < v < 0, then we will reach a 
state (u', I’, X’) € Q with v’ = —1 in at most n steps, where n is such that 
v/A” < —1. Therefore, the number of reachable states —1 < v < 0 is also finite. 
Additionally, the number of states where v = —1 that can be initially reached is 
finite. Upon reaching such a state (—1,@, X’), the reward machine is similar to 
that of the finite disjunction of reward machines for finite conjunctions. 


Property lı. The transition function is designed so that the following holds true: 
for any finite word w € X* of length n and letter o € X, if (qo, w) = (v, I, æ), 
then m(4,c) = ([p1U} t ye, wo] — [v1 U8 p2, w])/A”. Since r((v, I, ¥),o) = 
m(X,a), we get that Ry(w) = [yi UX p2, w]. Thus, Ry(p) = [p1Uap2, p] for 
any infinite word p € ©”. This property for m(%,c) follows from the properties 
outlined in the construction, which can be shown inductively. 


Property Iz. Consider an SCC C of Rọ and a state (v, I, 7) € C. If v = 0, then 
we must receive zero reward because non-zero reward causes the value of v to 
become negative. It cannot be that —1 < v < 0 since if v < 0, we reach a state 
(v’, 1’, X&’) € Q with v’ = —1 in at most n steps, where n is such that v/A” < —1. 
If v = —1, then we have a state of the form (—1,0, 4). For this to be an SCC, 
all elements of the form (qk, Ck) E€ S for S € X must be such that qp is in an 
SCC of its respective reward machine (either Ry, or Ry,) with reward type 
tk € {0,1 — A}. Additionally, there cannot be a t, # tk otherwise there would 
be a Çk that changes following a cycle in the SCC C. Thus, the reward for this 
SCC C is tg. 


Property I3. This property can be shown by recalling the property above that 
r((v, T, ¥), o) = m(¥,0) = ([prUN"" p2, wo] — [pr UR y2, w)/d”. 


5 Conclusion 


This paper studied policy synthesis for discounted LTL in MDPs with unknown 
transition probabilities. Unlike LTL, discounted LTL provides an insensitivity 
to small perturbations of the transitions probabilities which enables PAC learn- 
ing without additional assumptions. We outlined a PAC learning algorithm for 
discounted LTL that uses finite memory. We showed that optimal strategies for 
discounted LTL require infinite memory in general due to the need to balance 
the values of multiple competing objectives. To avoid this infinite memory, we 
examined the case of uniformly discounted LTL, where the discount factors for 
all temporal operators are identical. We showed how to translate uniformly dis- 
counted LTL formula to finite state reward machines. This construction shows 
that finite memory is sufficient, and provides an avenue to use discounted reward 
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algorithms, such as reinforcement learning, for computing optimal policies for 
uniformly discounted LTL formulas. 
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Abstract. We present a novel method to compute permissive winning 
strategies in two-player games over finite graphs with w-regular winning 
conditions. Given a game graph G and a parity winning condition ®, 
we compute a winning strategy template W that collects an infinite num- 
ber of winning strategies for objective ® in a concise data structure. We 
use this new representation of sets of winning strategies to tackle two 
problems arising from applications of two-player games in the context 
of cyber-physical system design — (i) incremental synthesis, i.e., adapt- 
ing strategies to newly arriving, additional w-regular objectives 6’, and 
(ii) fault-tolerant control, i.e., adapting strategies to the occasional or 
persistent unavailability of actuators. The main features of our strat- 
egy templates — which we utilize for solving these challenges — are their 
easy computability, adaptability, and compositionality. For incremental 
synthesis, we empirically show on a large set of benchmarks that our 
technique vastly outperforms existing approaches if the number of added 
specifications increases. While our method is not complete, our prototype 
implementation returns the full winning region in all 1400 benchmark 
instances, i.e. handling a large problem class efficiently in practice. 


1 Introduction 


Two-player w-regular games on finite graphs are an established modeling and 
solution formalism for many challenging problems in the context of correct-by- 
construction cyber-physical system (CPS) design [2,7,39]. Here, control software 
actuating a technical system “plays” against the physical environment. The win- 
ning strategy of the system player in this two-player game results in software 
which ensures that the controlled technical system fulfills a given temporal speci- 
fication for any (possible) event or input sequence generated by the environment. 
Examples include warehouse robot coordination [36], reconfigurable manufac- 
turing systems [26], and adaptive cruise control [33]. In these applications, the 
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Fig. 1. Experimental results over 1400 generalized parity games comparing the per- 
formance of our tool PESTEL against the state-of-the-art generalized parity solver 
GENZIEL [16]. Data points give the average execution time (in ms) over all instances 
with the same number of parity objectives. Left: all objectives are given upfront. Right: 
objectives are added one-by-one. See Sect.6 for more details on those experiments. 


technical system under control, as well as its requirements, are developing and 
changing during the design process. It is therefore desirable to allow for maintain- 
able and adaptable control software. This, in turn, requires solution algorithms 
for two-player w-regular games which allow for this adaptability. 

This paper addresses this challenge by providing a new algorithm to efficiently 
compute permissive winning strategy templates in parity games which enable 
rich strategy adaptations. Given a game graph G = (V, E) and an objective ® 
a winning strategy template W characterizes the winning region W C V along 
with three types of local edge conditions — a safety, a co-live, and a live-group 
template. The conjunction of these basic templates allows us to capture infinitely 
many winning strategies over G w.r.t. ® in a simple data structure that is both 
(i) easy to obtain during synthesis, and (ii) easy to adapt and compose. 

We showcase the usefulness of permissive winning strategy templates in the 
context of CPS design by two application scenarios: (i) incremental synthesis, 
where strategies need to be adapted to newly arriving additional w-regular objec- 
tives ®’, and (ii) fault-tolerant control, where strategies need to be adapted to 
the occasional or persistent unavailability of actuators, i.e., system player edges. 

We have implemented our algorithms in a prototype tool PESTEL and run it 
on more than 1400 benchmarks adapted from the SYNTCOMP benchmark suite 
[21]. These experiments show that our class of templates effectively avoids re- 
computations for the required strategy adaptations. For incremental synthesis, our 
experimental results are previewed in Fig. 1, where we compare PESTEL against 
the state-of-the-art solver GENZIEL |16] for generalized parity objectives, i.e., finite 
conjunction of parity objectives. We see that PESTEL is as efficient as GENZIEL 
whenever all conjuncts of the objective are given up-front (Fig. 1 (left)) - even out- 
performing it in more than 90% of the instances. Whenever conjuncts of the objec- 
tive arrive one at atime, PESTEL outperforms the existing approaches significantly 
if the number of objectives increases (Fig. 1(right)). This shows the potential of 
PESTEL towards more adaptable and maintainable control software for CPS. 
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Fig. 2. A two-player game graph with Player 1 (squares) and Player 0 (circles) vertices, 
different winning conditions ®;, and corresponding winning strategy templates W%. 
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Illustrative Example. To appreciate the simplicity and easy adaptability of 
our strategy templates, consider the game graph in Fig. 2(left). The first winning 
condition ®; requires vertex f to never be seen along a play. This can be enforced 
by Player 0 from vertices Wo = {a, b,c, d} called the winning region. The safety 
template YW ensures that the game always stays in Wo by forcing the edge ege to 
never be taken. It is easy to see that every Player 0 strategy that follows this rule 
results in plays which are winning if they start in Wo. Now consider the second 
winning condition a which requires vertex c or d to be seen infinitely often. 
This induces the live-group template YW which requires that whenever vertex a 
is seen infinitely often, either edge eg. or edge egg needs to be taken infinitely 
often. It is easy to see that any strategy that complies with this edge-condition 
is winning for Player 0 from every vertex and there are infinitely many such 
compliant winning strategies. Finally, we consider condition ®3 requiring vertex 
b to be seen only finitely often. This induces the strategy template W3 which 
is a co-liveness template requiring that all edges from Player 0 vertices which 
unavoidably lead to b (i.e., Cap, €ba, and eqe) are taken only finitely often. We can 
now combine all templates into a new template YW’ = Y1 A W A 3 and observe 
that all strategies compliant with W’ are winning for P' = B4 A Bo A B3. 

In addition to their compositionality, strategy templates also allow for local 
strategy adaptations in case of edge unavailability faults. Consider again the 
game in Fig. 2 with the objective 2. Suppose that Player 0 follows the strategy 
T: at d and dt a, which is compliant with W. If the edge egq becomes 
unavailable, we would need to re-solve the game for the modified game graph 
G = (V, E \ {eaa}). However, given the strategy template Y we see that the 
strategy 7/: a > cand d+ a is actually compliant with Yə over G”. This allows 
us to obtain a new strategy without re-solving the game. 

While these examples demonstrate the potential of templates for strategy 
adaptation, there exist scenarios where conflicts between templates or graph 
modifications arise, which require re-computations. Our empirical results, how- 
ever, show that such conflicts rarely appear in practical benchmarks. This sug- 
gests that our technique can handle a large problem class efficiently in practice. 


Related Work. The class of templates we use was introduced in [4] and utilized 
to represent environment assumptions that enable a system to fulfill its specifi- 
cations in a cooperative setting. Contrary to [4], this paper uses the same class 
of templates to represent the system’s winning strategies in a zero-sum setting. 
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While the computation of permissive strategies for the control of CPS is 
an established concept in the field of supervisory control! [14,42], it has also 
been addressed in reactive synthesis where the considered specification class is 
typically more expressive, e.g., Bernet et al. [8] introduce permissive strategies 
that encompass all the behaviors of positional strategies and Neider et al. [31] 
introduce permissiveness to subsume strategies that visit losing loops at most 
twice. Finally, Bouyer et al. [11] take a quantitative approach to measure the 
permissiveness of strategies, by minimizing the penalty of not being permissive. 
However, all these approaches are not optimized towards strategy adaptation and 
thereby typically fail to preserve enough behaviors to be able to effectively satisfy 
subsequent objectives. A notable exception is a work by Baier et al. [23]. While 
their strategy templates are more complicated and more costly to compute than 
ours, they are mazimally permissive (i.e., capture all winning strategies in the 
game). However, when composing multiple objectives, they restrict templates 
substantially which eliminates many compositional solutions that our method 
retains. This results in higher computation times and lower result quality for 
incremental synthesis compared to our approach. As no implementation of their 
method is available, we could not compare both approaches empirically. 

Even without the incremental aspect, synthesizing winning strategies for con- 
junctions of w-regular objectives is known to be a hard problem — Chatterjee 
et al. [16] prove that the conjunction of even two parity objectives makes the 
problem NP-complete. They provide a generalization of Zielonka’s algorithm, 
called GENZIEL for generalized parity objectives (i.e., finite conjunction of par- 
ity objectives) which is compared to our tool PESTEL in Fig. 1. While PESTEL 
is (in contrast to GENZIEL) not complete—i.e., there exist realizable synthesis 
problems for which PESTEL returns no solution—our prototype implementation 
returns the full winning region in all 1400 benchmark instances. 

Fault-tolerant control is a well-established topic in control engineering [9], 
with recent emphasis on the logical control layer [19,30]. While most of this work 
is conducted in the context of supervisory control, there are also some approaches 
in reactive synthesis. While [29,32] considers the addition of “disturbance edges” 
and synthesizes a strategy that tolerates as many of them as possible, we look 
at the complementary problem, where edges, in particular system-player edges, 
disappear. To the best of our knowledge, the only algorithm that is able to tackle 
this problem without re-computation considers Biichi games [15]. In contrast, our 
method is applicable to the more expressive class of Parity games. 


2 Preliminaries 


Notation. We use N to denote the set of natural numbers including zero. 
Given two natural numbers a,b € N with a < b, we use [a;b] to denote the 
set {n€N|a<n<b}}. For any given set [a;b], we write i Eeven [a;b] and 
i Eoaa [a;b] as shorthand for i € [a;b] N {0,2,4,...} and i € [a;b] A {1,3,5,...} 


1 See [18,28,37] for connections between supervisory control and reactive synthesis. 
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respectively. Given two sets A and B, a relation R C A x B, and an element 
a € A, we write R(a) to denote the set {b € B | (a,b) € R}. 


Languages. Let X be a finite alphabet. The notation X* and X” respectively 
denote the set of finite and infinite words over X, and X% is equal to X* UX”. 
For any word w € 2°, w; denotes the i-th symbol in w. Given two words u € X* 
and v € XY’, the concatenation of u and v is written as the word wv. 


Game Graphs. A game graph is a tuple G = (V = V°? u Vt, E) where (V, E) 
is a finite directed graph with vertices V and edges E, and V°, V! C V forma 
partition of V. Without loss of generality, we assume that for every v € V there 
exists v’ € V s.t. (v,v’) € E. A play originating at a vertex vo is a finite or 
infinite sequence of vertices p = vovi ... E V”. 


Winning Conditions/Objectives. Given a game graph G, we consider win- 
ning conditions/objectives specified using a formula ® in linear temporal logic 
(LTL) over the vertex set V, that is, we consider LTL formulas whose atomic 
propositions are sets of vertices V. In this case the set of desired infinite plays 
is given by the semantics of ® which is an w-regular language L(®) C V®. 
Every game graph with an arbitrary w-regular set of desired infinite plays can 
be reduced to a game graph (possibly with a different set of vertices) with an 
LTL winning condition, as above. The standard definitions of w-regular lan- 
guages and LTL are omitted for brevity and can be found in standard textbooks 
[6]. To simplify notation we use e = (u,v) in LTL formulas as syntactic sugar 
for uA Ou, with © as the LTL nezt operator. We further use a set of edges 
E" = {ei} icjo,4) aS atomic proposition to denote V;< (0,4) €i- 


Games and Strategies. A two-player (turn-based) game is a pair G = (G,®) 
where G is a game graph and @ is a winning condition over G. A strategy of 
Player i, i € {0,1}, is a function 7’: V*V* — V such that for every pv € V*V* 
holds that 7'(pv) € E(v). Given a strategy 7, we say that the play p = vou, ... 
is compliant with nt if vp_1 € V’ implies vz = 7*(vo ... vg—1) for all k. We refer 
to a play compliant with a’ and a play compliant with both 7° and 7! as a 
n’-play and a 1°r'-play, respectively. We collect all plays originating in a set S 
and compliant with 7’, (and compliant with both 7° and 7+) in the sets L(S, 7°) 
(and L($,7°7'), respectively). When S = V, we drop the mention of the set 
in the previous notation, and when S is singleton {v}, we simply write L(v, 7°) 
(and L(v, 7°7"), respectively). 


Winning. Given a game G = (G,®), a play p in G is winning for Player 0, if 
p € L(®), and it is winning for Player 1, otherwise. A strategy n’ for Player i is 
winning from a vertex v € V if all plays compliant with 7’ and originating from 
v are winning for Player i. We say that a vertex v € V is winning for Player i, 
if there exists a winning strategy 7’ from v. We collect all winning vertices of 
Player i in the Player i winning region Wi C V. We always interpret winning 
w.r.t. Player 0 if not stated otherwise. 


Strategy Templates. Let 7° be a Player 0 strategy and @ be an LTL formula. 
Then we say 7° follows ©, denoted 1° |r &, if for all 7°-plays p, p belongs to 
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L(®), ie. L(n°) C L(). We refer to a set VW = {%,...,Y%} of LTL formulas as 
strategy templates representing the set of strategies that follows % A... A Wk. 
We say a strategy template ¥ is winning from a vertex v for a game (G,®) if 
every Player 0 strategy following the template W is winning from v. Moreover, 
we say a strategy template W is winning if it is winning from every vertex in Wo. 
In addition, we call YW mazimally permissive for G, if every Player 0 strategy m 
which is winning in G also follows W. With slight abuse of notation, we use W for 
the set of formulas {W%,...,%}, and the formula Y; A... A^ Wp, interchangeably. 


Set Transformers. Let G = (V = V? ù Vt, E) be a game graph, U C V bea 
subset of vertices, and a € {0,1} be the player index. Then 
uprec(U) ={v E€ V | Y(v,u) € E. u € U} (1) 
cpreġ (U) ={v € V° | 3(v,u) € E. we U} U {v E V° | u € upreg(U)} (2) 


The universal predecessor operator upreg(U) computes the set of vertices with 
all the successors in U and the controllable predecessor operator cpre¢,(U) the 
vertices from which Player a can force visiting U in exactly one step. In the 
following, we introduce two types of attractor operators: attré(U) that computes 
the set of vertices from which Player a can force at least a single visit to U in 
finitely many steps, and the universal attractor uattrg(U) that computes the set 
of vertices from which both players are forced to visit U. For the following, let 
pre € {upre, cpre“} 


prec(U) = preg(U)UU  preg(U) = preg (preg '(U)) U preg (U) (3) 
attra(U) =Uis1 cpreġ (U) uattrg(U) = U;>1 upreG(U) (4) 


3 Computation of Winning Strategy Templates 


Given a 2-player game G with an objective ®, the goal of this section is to com- 
pute a strategy template that characterizes a large class of winning strategies 
of Player 0 from a set of vertices U C V in a local, permissive, and computa- 
tionally efficient way. These templates are then utilized in Sect. 5.1 for computa- 
tional synthesis. In particular, this section introduces three distinct template 
classes—safety templates (Sect. 3.1), live-group-templates (Sect. 3.2), and co- 
live-templates (Sect. 3.3) along with algorithms for their computation via safety, 
Büchi, and co-Büchi games, respectively. We then turn to general parity objec- 
tives which can be thought of as a sophisticated combination of Büchi and co- 
Büchi games. We show in Sect. 3.4 how the three introduced templates can be 
derived for a general parity objective by a suitable combination of the previ- 
ously introduced algorithms for single templates. All presented algorithms have 
the same worst-case computation time as the standard algorithms solving the 
respective game. This shows that extracting strategy templates instead of ’nor- 
mal’ strategies does not incur an additional computational cost. We prove the 
soundness of the algorithms and discuss the complexities in the full version [5, 
Appendix A]. 
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3.1 Safety Templates 


We start the construction of strategy templates by restricting ourselves to games 
with a safety objective—i.e., G = (G,®) with 8 := OU for some U C V. A 
winning play in a safety game never leaves U C V. It is well known that such 
games allow capturing all winning strategies by a simple local template which 
essentially only allows Player 0 moves from winning vertices to other winning 
vertices. This is formalized in our notation as a safety template as follows, 


Theorem 1 (|8, Fact 7]). Let G = (G,QU) be a safety game with winning 
region Wo and S = {(u,v) € E | (ue V9 Wo) A (v ¢ Wo)}. Then 


Wonsare (9) = Nees “e, (5) 


is a winning strategy template for the game G which is also maximally permissive. 


It is easy to see that the computation of the safety template Wunsare( S) 
reduces to computing the winning region Wo in the safety game (G, OU) and 
extracting S. We refer to the edges in S as unsafe edges and we call this algorithm 
computing the set S as SAFETYTEMPLATE(G,U). Note that it runs in O(m) 
time, where m = |E], as safety games are solvable in O(m) time. 


3.2 Live-Group Templates 


As the next step, we now move to simple liveness objectives which require a par- 
ticular vertex set J C V to be seen infinitely often. Here, winning strategies need 
to stay in the winning region (as before) but in addition always eventually need 
to make progress towards the vertex set J. We capture this required progress by 
live-group templates—given a group of edges H C E, we require that whenever 
a source vertex v of an edge in H is seen infinitely often, an edge e € H (not 
necessarily starting at v) also needs to be taken infinitely often. This template 
ensures that compliant strategies always eventually make progress towards I, as 
illustrated by the following example. 


Example 1. Consider the game graph in Fig.2 where we require visiting {c,d} 
infinitely often. To satisfy this objective from vertex a, Player 0 needs to not 
get stuck at a, and should not visit b always (since Player 1 can force visiting 
a again, and stop Player 0 from satisfying the objective). Hence, Player 0 has 
to always eventually leave a and go to {c,d}. This can be captured by the live- 
group {€ac, €aa}. Now if the play comes to a infinitely often, Player 0 will go to 
either c or d infinitely often, hence satisfying the objective. 


Formally, such games are called Büchi games, denoted by G = (G = (V, E), 8) 
with ® = OQI, for some J C V. In addition, a live-group H = {e;}j>0 is a set 
of edges ej = (s;,t;) with source vertices src(H) := {s;}j>0. Given a set of 
live-groups H = {H;i };>o we define a live-group template as 


Duve(H) = A OOsre(Hi) => O0H;. (6) 
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Algorithm 1. BUCHITEMPLATE(G, I) 
Input: A game graph G, and a subset of vertices I 
crat part: A set of unsafe edges S and a set of live-groups H 
: Wo — Btcui(G, I); S — SAFETY TEMPLATE(G, Wo); 
G<—Glwo; I IAN Wo; 
H — REACHTEMPLATE(G, I); 
return (S, H) 
procedure REACHTEMPLATE(G, I C V) 

H - Í; 

while J 4 V do 

A — uattra (I); B — cpre?,(A); H — HU {Epces(B, A)}; I — AUB; 


return H 


The live-group template says that if some vertex from the source of a live-group is 
visited infinitely often, then some edge from this group should be taken infinitely 
often by the following strategy. 

Intuitively, winning strategy templates for Büchi games consist of a safety 
template conjuncted with a live-group template. While the former enforces all 
strategies to stay within the winning region W, the latter enforces progress 
w.r.t. the goal set J within W. Therefore, the computation of a winning strategy 
template for Büchi games reduces to the computation of the unsafe set S to 
define Wonsare( S) in (5) and the live-group H to define Yuwve(H) in (6). We 
denote by BUCHITEMPLATE(G, I) the algorithm computing the above as detailed 
in Algorithm1. The algorithm uses some new notations that we define here. 
Here, the function BUCHI solves a Biichi game and returns the winning region 
(e.g., using the standard algorithm from [17|), EDGES(X,Y) = {(u,v) € E | 
u € X,v € Y}, is the set of edges between two subsets of vertices X and Y. 
Glu = (U = U? u U}, F’) s.t. U? := V° AU, Ut = VNU, and EB’ := EN(UxU) 
denotes the restriction of a game graph G := (V =V" uV}, E) to a subset of 
its vertices U C V. We have the following formal result. 


Theorem 2. Given a Büchi game G = (G, OQI) for some I C V, if (S, H) = 
BÜÖCHITEMPLATE(G, I) then Ý = {Vunsare( S), Give(H)} is a winning strategy 
template for the game G, computable in time O(nm), where n = |V| and m = |E]. 


While live-group templates capture infinitely many winning strategies in 
Büchi games, they are not maximally permissive, as exemplified next. 


Example 2. Consider the game graph in Fig.2 restricted to the vertex set 
{a,b,d} with the Büchi objective Od. Our algorithm outputs the live-group 
template YW = Yive({€aa}). Now consider the winning strategy with memory 
that takes edge eqa from d, and takes eap for play suffix bda and eaa for play 
suffix aba. This strategy does not follow the template—the play (abd)” is in 
L(r°) but not in L(Y). 
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3.3 Co-live Templates 


We now turn to yet another objective which is the dual of the one discussed 
before. The objective requires that eventually, only a particular subset of vertices 
I is seen. A winning strategy for this objective would try to restrict staying or 
going away from J after a finite amount of time. It is easy to notice that live- 
group templates can not ensure this, but it can be captured by co-live templates: 
given a set of edges, eventually these edges are not taken anymore. Intuitively, 
these are the edges that take or keep a play away from I. 


Example 3. Consider the game graph in Fig. 2 where we require eventually stop 
visiting b, i.e. staying in J = {a,c,d}. To satisfy this objective from vertex a, 
Player 0 needs to stop getting out of J eventually. Hence, Player 0 has to stop 
taking the edges {€ab, €db, ede}, which can be ensured by marking both edges 
co-live. Now since no edges are leading to b, the play eventually stays in J, 
satisfying the objective. We note that this can not be captured by live-groups 
{€aa; Cac, €ad} and {eda}, since now the strategy that visits c and b alternatively 
from Player 0’s vertices, does not satisfy the objective, but follows the live-group. 


Formally, a co-Biichi game is a game G = (G,®) with co-Biichi winning 
condition ® := OUI, for some goal vertices J C V. A play is winning for Player 0 
in such a co-Biichi game if it eventually stays in J forever. The co-live template 
is defined by a set of co-live edges D as follows, 


Voorvn(D) = VAN oU-e. 


e€D 


The intuition behind the winning template is that it forces staying in the 
winning region using the safety template, and ensures that the play does not go 
away from the vertex set I infinitely often using the co-live template. We provide 
the procedure in Algorithm 2 and its correctness in the following theorem. Here, 
CoBtcui(G, J) is a standard algorithm solving the co-Biichi game with the goal 
vertices J, and outputs the winning regions for both players [17]. We also use the 
standard algorithm SAFETY(G, J) that solves the safety game with the objective 
to stay in A forever. 


Theorem 3. Given a co-Btichi game G = (G,OUI) for some I C V, if 
(S, D) = cOBUCHITEMPLATE(G, I) then Y = {Vunsare( S), Yoouve(D)} is a win- 
ning strategy template for Player 0, computable in time O(nm) with n = |V| and 
m = |E]. 


3.4 Parity Games 


We now consider a more complex but canonical class of w-regular objectives. 
Parity objectives are of central importance in the study of synthesis problems 
as they are general enough to model a huge class of qualitative requirements of 
cyber-physical systems, while enjoying the properties like positional determinacy. 
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Algorithm 2. COBUCHITEMPLATE(G, I) 
Input: A game graph G, and a subset of vertices I 
Output: A set of unsafe edges S and a set of co-live edges D 
:S—9;D-9@ 
Wo — CoBtcui(G, I); S — SAFETY TEMPLATE(G, Wo) 
G-—Glw;l-—1nNWs; 
while V 49 do 
A+ SAFETY(G,/); D — DU Epces(A, V\4A); 
while cpre?,(A) 4 A do > Outputs attr% (A) 
B — cpre% (A); 
D — D U Epces(B, V\(A U B)) U Epces(B, B); 
A< AUB; 
G — Glvya; I = IA V; 
: return (S, D) 


ee 
ee © 


A parity game is a game G = (G,®) with parity winning condition ® = 
Parity(P), where 


Parity(P) = Niesaal0;k] ( OP, = Vjcevenli-+1;k] OP) , (7) 


with P; = {q € Q | P(g) = i} for some priority function P : V — [0;d] that 
assigns each vertex a priority. A play is winning for Player 0 in such a game if 
the maximum of priorities seen infinitely often is even. 

Although parity objectives subsume previously described objectives, we can 
construct strategy templates for parity games using the combinations of previ- 
ously defined templates. To this end, we give the following algorithm. 


Theorem 4. Given a parity game G = (G, Parity(P)) with priority function 
P: V — [0;d], if ((Wo,Wi),H,D) = PariryTEMPLATE(G,P), then Y = 
{Wonsare($),Wuve(H), Yoouve(D)} is a winning strategy template for the game 
G, where S = EDGES(Wo, W1). Moreover, the algorithm terminates in time 
O(n), which is same as that of Zielonka’s algorithm. 


We refer the readers to the full version [5, Appendix A.3] for the complete 
proofs, and here we provide the intuition behind the algorithm and the computa- 
tion of the algorithm on the parity game in Fig. 3. The algorithm follows the divide- 


Fig. 3. A parity game, where a vertex with priority i has label p;. The dotted edge in 
red is a co-live edge, while the dashed edges in blue are singleton live-groups. (Color 
figure online) 
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Algorithm 3. PARITYTEMPLATE(G, P) 
Input: A game graph G, and a priority function P: V — {0,...,d} 
Output: Winning regions (Wo, W1), live-groups H, and co-live edges D 
1: if d is odd then 
2: A = attr& (Pa) 


3 if A = V then return (0, V), 0,0 

4 else 

5: (Wo, W1), H, D — PARITYTEMPLATE(G|v\ a, P) 

6: if Wo = then return (0, V), 0,0 

ve else 

8: B =attr2(Wo) 

9: D — DU EpcEs(Wo, V\Wo) 

10: H — HU REACHTEMPLATE(G, Wo) 

11: (Wi, Wi), H’, D' — PARITYTEMPLATE(G|v\ g, P) 
12: return (Wi U B, Wi), HUH’, DU D’ 

13: else > If d is even 


14: A = attr} (Pa) 
15: if A=V then return (V, Ø), REACHTEMPLATE(G, P4), 0 


16: else 

17: (Wo, W1), H, D — PariryTEMPLATE(G|y\ a, P) 

18: if W, = Í then return (V, 0), H U ReEacHTEMPLATE(G|a, Pa), D 
19: else 

20: B = attr (W1) 

21: (Wi, Wi), H’, D' — Parity TEMPLATE(G|y\z, P) 

22: return (Wb, Wi U B), H’, D’ 


and-conquer approach of Zeilonka’s algorithm. Since the highest priority occurring 
is 6 which is even, we first find the vertices A = {d, h} from which Player 0 can force 
visiting {d} (vertices with priority 6) in line 14. Then since A Æ V, we find the 
winning strategy template in the rest of the graph G1 = G|v\ 4. Then the highest 
priority 5 is odd, hence we compute the region {c} from which Player 1 can ensure 
visiting 5. We again restrict our graph to G2 = G|ta,b,e,f,g}- Again, the highest pri- 
ority is even. We further compute the region A2 = {a,b} from which Player 0 can 
ensure visiting the priority 4, giving us G3 = G|,¢, 7,9}. In G3, Player 0 can ensure 
visiting the highest priority 2, hence satisfying the condition in line 15. Then since 
in this small graph, Player 0 needs to keep visiting priority 2 infinitely often, which 
gives us the live-groups {egf} and {epp} in line 15. Coming one recursive step back 
to G2, since Gs doesn’t have a winning vertex for Player 1, the if condition in the 
line 18 is satisfied. Hence, for the vertices in Ag, it suffices to keep visiting priority 
4 to win, which is ensured by the live-group {e,,} added in the line 18. Now, again 
going one recursive step back to G1, we have Wo = {a,b,e, f, g}. If Player 0 can 
ensure reaching and staying in Wo from the rest of the graph G4, it can satisfy the 
parity condition. Since from the vertex c, Wo will anyway be reached, we get a co- 
live edge evc in line 9 to eventually keep the play in Wo. Coming back to the initial 
recursive call, since now again G4 was winning for Player 0, they only need to be 
able to visit the priority 6 from every vertex in A, giving another live-group {ena}. 


Synthesizing Permissive Winning Strategy Templates for Parity Games 447 


4 Extracting Strategies from Strategy Templates 


This section discusses how a strategy that follows a computed winning strategy 
template can be extracted from the template. As our templates are just par- 
ticular LTL formulas, one can of course use automata-theoretic techniques for 
this. However, as the types of templates we presented put some local restrictions 
on strategies, we can extract a strategy much more efficiently. For instance, the 
game in Fig.2 with strategy template Y = Wive({eac, €aa}) allows the strategy 
that simply uses the edges eac and eaa alternatively from vertex a. 

However, strategy extraction is not as straightforward for every template, 
even if it only conjuncts the three template types we introduced in Sect. 3. For 
instance, consider again the game graph from Fig.2 with a strategy template 
W = {Woxsarn(Cac; Cad); Yoouwve(€aa; Cab) }- Here, non of the four choices of Player 0 
(i.e., outgoing edges) from vertex a can be taken infinitely often, and, hence, the 
only way a play satisfies ¥ is to not visit vertex a infinitely often. On the other 
hand, given strategy template W = {Weorve(eab, Cab); Yuva ({€ad; Cacs Cav }) }, edge 
ea is both live and co-live, which raises a conflict for vertex d. Hence, the only 
way a strategy can follow W’ is again to ensure that d is not visited infinitely 
often. We call such situations conflicts. Interestingly, the methods we presented 
in Sect. 3 never create such conflicts and the computed templates are therefore 
conflict-free, as formalized next and proven in the full version [5, Appendix A.4]. 


Definition 1. A strategy template V = {Vonsare( S), Yoouve(D), Yive(H)} in a 
game graph G = (V, E) is conflict-free if the following are true: 


(i) or every vertex v, there is an outgoing edge that is neither co-live nor unsafe, 
ie., ux E(v) Z DUS, and 

(ü) for every source vertex v in a live-group H E€ H, there exists an outgoing 
edge in H which is neither co-live nor unsafe, i.e., v x H(v) Z DUS. 


Proposition 1. Algorithms 1, 2, and 3 always return conflict-free templates. 


Due to the given conflict-freeness, winning strategies are indeed easy to 
extract from winning strategy templates, as formalized next. 


Proposition 2. Given a game graph G = (V,E) with conflict-free winning 
strategy template WÙ = {Yonsare(S), Woouve(D), Puve(H)}, a winning strategy To 
that follows Y can be extracted in time O(m), where m is the number of edges. 


The proof is straightforward by constructing the winning strategy as follows. 
We first remove all unsafe and co-live edges from G and then construct a strategy 
To that alternates between all remaining edges from every vertex in Wo. This 
strategy is well defined as condition (i) in Definition 1 ensures that after removing 
all the unsafe and co-live edges a choice from every vertex remains. Moreover, if 
the vertex is a source of a live-group edge, condition (ii) in Definition 1 ensures 
that there are outgoing edges satisfying every live-group. It is easy to see that 
the constructed strategy indeed follows W and is hence winning from vertices in 
Wo, as W was a winning strategy template. We call this procedure of strategy 
extraction EXTRACTSTRATEGY(G, Y). 
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5 Applications of Strategy Templates 


This section considers two concrete applications of strategy templates which 
utilize their structural simplicity and easy adaptability. 

In the context of CPS control design problems, it is well known that the 
game graph of the resulting parity game used for strategy synthesis typically has 
a physical interpretation and results from behavioral constraints on the existing 
technical system that is subject to control. In particular, following the well- 
established paradigm of abstraction-based control design (ABCD) [2,7,39], an 
underlying (stochastic) disturbed non-linear dynamical system can be automat- 
ically abstracted into a two-player game graph using standard abstraction tools, 
e.g. SCOTS [35], ARCS [13], MASCOT [20], P-FACES [22], or ROCS [27]. 

In contrast to classical problems in reactive synthesis, it is very natural in 
this context to think about the game graph and the specification as two different 
objects. Here, specifications are naturally expressed via propositions that are 
defined over sets of states of this underlying game graph, without changing its 
structure. This separation is for example also present in the known LTL fragment 
GR(1) [10]. Arguably, this feature has contributed to the success of GR(1)-based 
synthesis for CPS applications, e.g. [1,3,24,25,38, 40,41]. 

Given this insight, it is natural to define the incremental synthesis problem 
such that the game graph stays unchanged, while newly arriving specifications 
are modeled as new parity conditions over the same game graph. Formally, this 
results in a generalized parity game where the different objectives arrive one at a 
time. We show an incremental algorithm for synthesizing winning strategies for 
such games in Sect. 5.1. Similarly, fault-tolerant control requires the controller to 
adapt to unavailable actuators within the technical system under control. This 
naturally translates to the removal of Player 0 edges within the game graph 
given its physical interpretation. We show how strategy templates can be used 
to adapt winning strategies to these game graph modifications in Sect. 5.2. 


5.1 Incremental Synthesis via Strategy Templates 


In this section we consider a 2-player game G with a conjunction 6 = Naa P; of 
multiple parity objectives ®;, also called a generalized parity objective. However, 
in comparison to existing work |12,16], we consider the case that different objec- 
tives ®; might not arrive all at the same time. The intuition of our algorithm 
is to solve each parity game (G,®;) separately and then combine the resulting 
strategy templates Y; to a global template ¥ = Naa W;. This allows to easily 
incorporate newly arriving objectives k41. We only need to solve the parity 
game (G, p41) and then combine the resulting template Y%,41 with W. 

While Proposition 1 ensures that every individual template W; is conflict- 
free, this does unfortunately not imply that their conjunction is also conflict- 
free. Intuitively, combinations of strategy templates can cause the condition (i) 
and (ii) in Definition 1 to not hold anymore, resulting in a conflict. As already 
discussed in Sect. 4, this requires source vertices U C V with such conflicts to 
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Algorithm 4. COMPOSETEMPLATE(G, (Wh, H’, D’, (®i)i<e), (Pi)e<i<k) where 

®; = Parity(P;) 

Input: A generalized parity game G = (V,£E) and objectives (®;);<, with ®; = 
Parity(P;) such that P; : V — [0; 2d; +1] along with a partial winning region, live- 
groups, and co-live edges (Wo, H, D) for the generalized parity game (G, Nice i). 

Output: A partial winning region Wo, live-groups H, co-live edges D, and modified 

parity objectives (®;)i<x. 

(Wi, V \ Wi), Hi, Di — PARITYTEMPLATE(G|w,,®;) for each £< i < k 

H =H U Upcicn Hi; D = D' UUrcicn Dis Wo = Wo Ne<icr Wi 

Cı = {u € Wo | u x (E(u) N Wo) € D} 

Co = {u E€ Wo | u x (H(u) N Wo) CD, H € H, H(u) 4 0} 

if Cı UC. = @ then 
return (Wo, H, D, (Di)i<k) 

else 
P; (u) — P[C; U C2 — 2d; + 1] for each i < k 
return COMPOSETEMPLATE(G, (Wo, 0, 0, 0), (;)i<k) with 8; = Parity (P;)) 


eventually not be visited anymore. We therefore resolve such conflicts by adding 
the specification (L-U to every objective and recomputing the templates. 

To efficiently formalize this objective change, we note that a parity objective 
Parity(P) with an additional specification QOU for some U C V is equivalent to 
another parity objective Parity(P’), where priority function P’ can be obtained 
from P : V — (0; 2d+1] just by modifying the priorities of vertices in U to 2d+1. 
Let us denote such a priority function by P[U — 2d + 1]. In particular, we have 
the following result: 


Lemma 1. Given a game graph G and two parity objectives ® = Parity(P), 
P = Parity(P’) such that P : V > [0;2d+ 1] and P’ = P[U — 2d + 1] for some 
vertex set U C V, it holds that £(®’) = L(@A QOU). Moreover, if a strategy 
template is winning from some vertex u in the game G’ = (G, ©’), then it is also 
winning from u in the game G = (G,®). 


Using the above ideas, we present Algorithm 4 to solve generalized parity 
games (possibly incrementally). If no partial solution to the synthesis problem 
exists so far we have £ = 0, otherwise the game (G, /\,-,®;) was already solved 
and the respective winning region and templates are known. In both cases, the 
algorithm starts with computing a winning strategy template for each game 
(G,®;) for i € {L+ 1,k} (line 1) and conjuncts them with the already com- 
puted ones (line 2). Then the algorithm checks for conflicts (line 3-4). If there is 
some conflict the algorithm modifies the objectives to ensure that the conflicted 
vertices are eventually not visited anymore (line 8), and then re-computes the 
templates in the game graph restricted to the intersection of winning regions 
for all objectives (line 9). If there is no conflict, then the algorithm returns the 
conjunction of the templates which is conflict-free, and hence, is winning from 
the intersection of winning regions for every objective (line 6). The latter is for- 


450 A. Anand et al. 


malized in the following theorem. The proof can be found in the full version [5, 
Appendix B.2]. 


Theorem 5. Given a generalized parity game G = (G, Ni<p®i) with 
®; = Parity(P;) and priority functions P; : V — ([0;2d; + 1], if 
(Wo, H, D, (f;)i<~) = COMPOSETEMPLATE(G, 0, (V,0,0),(®i)icx), then Y = 
{Wonsare (9), Yave(H), Yoouve(D)} is an conflict-free strategy template that is 
winning from Wo in the game G, where S = EDGES(Wo,V \ Wo). Further, 
W is computable in time O(kn?4*3) time, where n = |V| and d = maxi<k di. 


Due to the conflict checks carried out within Algorithm 4 the returned modi- 
fied objectives ®; ensure that the conjunction W := Naa W! of winning strategy 
templates W/ for the games (G, &) is indeed conflict-free. In particular, the con- 
juncted template W is actually returned by the algorithm. Hence, incrementally 
running Algorithm4 is actually sound. This is an immediate consequence of 
Theorem 5 and stated as a corollary next. 


Corollary 1. Given a generalized parity game G = (G, Nick Ê i) with &; 
Parity(P;) and priority functions P; : V — [0;2d; + 1], s.t. 


(Wo, H', D', (ice) == ComposeTEMPLATE(G, (V, 0,0, 0), (@i)ice), and 
(Wo, H, D, (B! )i<k) := COMPOSETEMPLATE(G, (Wo, H’, D’, (G;)i ce); (Pi)e<i<r) 


then Y = {Vonsare( S), Yuve(H), Ycouve( D)} is an conflict-free strategy template 
that is winning from Wo in the game G, where S = EDGES(Wo, V \Wo). Further, 
W is computable in time O(kn74+3), where n = |V| and d = max;<p dj. 
We note that the generalized Zielonka algorithm ae for solving generalized 
a.) for a game with n ver- 
tices, m edges and k priority functions: P; with DU, ae for each i. Clearly, 
Algorithm 4 has a much better time complexity. However, it is not complete, 
e., it does not always return the complete winning region. This is due to tem- 
plates being not maximally permissive and hence potentially raising conflicts 
which result in additional specifications that are not actually required. The next 
example shows such an incomplete instance for illustration. We however note 
that Algorithm 4 returned the full winning region on all benchmarks considered 
during evaluation, suggesting that such instances rarely occur in practice. 


Example 4. Consider the game in Fig.2 with objectives 3 A 4 with 4 = 
Parity(P), where P maps vertices a, b,c, d,e, f to 0,2,1,1,1,1, respectively. The 
winning strategy templates computed by PARITYTEMPLATE for objectives ®3 
and 4 are Y3 = Voorve(€ab, Cab; Cde) and Wy = Virvel {€ab, Cab, Ede }), respectively. 
The conjunction of both templates marks all outgoing edges of vertex a and d in 
the live-group co-live. Hence, the algorithm would ensure that these conflicted 
vertices a and d are eventually not visited anymore. However, the only way to 
satisfy 3 A; is by eventually looping on vertex a. But this solution was skipped 
by the strategy template W4 by putting edge eap in a live-group. Therefore, 
the algorithm returns the empty set as the winning region, whereas the actual 
winning region is the whole vertex set. 
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5.2 Fault-Tolerant Strategy Adaptation 


In this section we consider a 2-player parity game G = (G, Parity(P)) and a set of 
faulty Player 0 edges F C EN(V° x V) which might become unavailable during 
runtime. Given a strategy template W for G, we can use W’ = {W, Wonsare( F) } for 
the (linear-time) extraction of a new strategy for the game, if W’ is conflict-free for 
G. In this case, no re-computation is needed. If W’ is not conflict-free for G, then 
we can remove the edges in F and compute a new winning strategy template 
using Algorithm 3. This is formalized in Algorithm 5, where we slightly abuse 
notation and assume that PARITYTEMPLATE only outputs strategy templates. 
The correctness of Algorithm 5 follows directly from Theorem 4. 


Corollary 2. Given a 2-player parity game G = (G, Parity(P)) with a strategy 
template Ù = PARITYTEMPLATE(G,P) and faulty edge set F C EN (V° x V) it 
holds that Y’ obtained from Algorithm 5 is a winning strategy template for G| r; F- 


Faulty edges introduce an additional safety specification for which our templates 
are maximally permissive. This implies that Algorithm 5 is sound and complete 
~ if there exists a winning strategy for (G| p\ r, Parity(P)) Algorithm 5 finds one. 

Let us now assume that F collects all edges controlling vulnerable actuators 
that might become unavailable. In this scenario, Algorithm 5 returns a conserva- 
tive strategy that never uses vulnerable actuators. It might however be desirable 
to use actuators as long as they are available to obtain better performance. For- 
mally, this application scenario can be defined via a time-dependent graph who’s 
edges change over time, i.e., F with Eo = E are the edges available at time 
t € Nand F := {e € E |e ¢ E;, for some i}. Given the original parity game 
G = (G, Parity(P)) with a winning strategy template Y we can easily modify 
EXTRACTSTRATEGY(G, W) to obtain a time-dependent strategy mg which reacts 
to the unavailability of edges, i.e., at time t, 7, takes an edge e € E,\(SUD) for 
all vertices without any live-group, and for the ones with live-groups, it alter- 
nates between the edges satisfying the live-groups whenever they are available, 
and an edge e € E,\(.S U D) when no live-group edge is available. 

The online strategy Tg can be implemented even without knowing when edges 
are available’, i.e., without knowing the time dependent edge sequence {E:hien 


Algorithm 5. FAULTCORRECTION(G, Y, F) 


Input: A parity game G = (G, Parity(P)), a strategy template ¥, and a set of faulty 
edges F 
Output: A new strategy template W’ 
1: W — {W, Vonsare( F) } 
2: if CHECKTEMPLATE(G,W’) then return W’ 
3: else 
4: return PARITYTEMPLATE(G|p\F,P|z\F) 


2 We note that it is reasonable to assume that current actuator faults are visible to 
the controller at runtime, see e.g. [34] for a real water gate control example. 
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up front. In this case mg is obviously winning in G = (G, Parity(P)) if W is 
conflict-free for G|a\ r. If this is not the case, one needs to ensure that edges 
that cause conflicts are always eventually available again, as formalized next. 


Definition 2. Given a parity game G = (G, Parity(P)) we call the dynamic 
edge set {E;};.) a guaranteed availability fault (GAF) if Y plays p = vou, ..., 
W € V, ifv € inf(p), then Ve = (v,w) € F, 3 infinitely many times to, ti... 
such that v, =v and e € Es, Vj = 0. 


Intuitively, guaranteed availability faults (GAF) ensure that a faulty edge is 
always eventually available when a play is in its source vertex. Under this fault, 
the following fault-correction result holds, which is proven in the full version [5, 
Appendix B.3]. 


Proposition 3. Given a game graph G with a parity objective P, a strategy 
template V = {Vonsare( S), Yuve (H), Yeouwve(D)} computed by Algorithm 3 and a 
set F ={ecE|e¢E,, for some i} of faulty edges, the game with the objective 
is realizable under GAF if for every vertex v € V°, there is an outgoing edge 
which is not in SUDUF. 


This proposition allows a simple linear-time algorithm to check if the tem- 
plates computed by Algorithm 3 are GAF-tolerant: check if every vertex in the 
winning region has an outgoing edge which is not in SU DU F. If this is not the 
case, the recomputation is non-trivial and is out of scope of this paper. We can 
however collect the vertices which do not satisfy the above property and alert the 
system engineer that these vulnerable actuators require additional maintenance 
or protective hardware. Our experimental results in Sect.6 show that conflicts 
arising from actuator faults are rare and very local. Our strategy templates allow 
to easily localize them, which supports their use for CPS applications. 


6 Empirical Evaluation 


We have developed a C++-based prototype tool PESTEL? (computing 
Permissive Strategy TEmpLates) that implements Algorithms 1-5. We have 
used PESTEL to show its superior performance on the two applications con- 
sidered in Sect. 5, suggesting its practical relevance. All our experiments were 
performed on a computer equipped with Apple M1 Pro 8-core CPU and 16GB 
RAM. 


Incremental Synthesis. We used PESTEL to solve generalized parity games 
both in one shot and incremental. We compare our algorithm with existing algo- 
rithms, i.e., GENZIEL from [16] and three partial solvers* from [12], by executing 


3 Repository URL: https://github.com/satya2009rta/pestel. 

4 While GENZIEL is sound and complete [16], we found different randomly generated 
games where the algorithms from [12] either return a superset or a subset of the 
winning region, hence compromising soundness and completeness. Since [12] lacks 
rigorous proof, it is not clear whether this is an implementation bug or a theoretical 
mishap, leaving soundness and completeness guarantees of these algorithms open. 
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Table 1. Aggregated experimental results on generalize parity game benchmarks with 
objectives given up-front (top) and one-by-one (bottom). Subrows: 1st row (mean time) 
— average computation time (in ms); 2nd row (incomplete) — number of examples where 
the corresponding tool failed to compute the complete winning region; 3rd row (faster 
than) — number of examples where PESTEL is faster than the respective tool; 4th row 
(timeouts) — number of examples where the respective tool timed out (10000 ms). 


PeSTeEL | GenZieL [16] GENA & GENZIEL & GENZIEL & 
GenBüchi [12] | GenGoodEp[12] | GenLay[12] 
mean time 343 64 68 553 1224 
Benchmark A | incomplete 0 - 3 3 2 
(one shot) | faster than - 74% 75% 96% 85% 
timeouts 0 0 0 2 20 
mean time 60 47 58 112 171 
Benchmark B | incomplete 0 - 28 27 2 
(one shot) | faster than - 93% 93% 97% 95% 
timeouts 1 0 2 4 18 
Overall faster than - 90% 90% 97% 94% 
mean time 91 208 215 338 394 
Benchmark B | incomplete 0 - 24 23 2 
(incremental) | faster than > 97% 97% 98% 99% 
timeouts 2 0 0 8 23 


them on a large set of benchmarks. We have generated two types of benchmarks 
from the games used for the Reactive Synthesis Competition (SYNTCOMP) [21]. 
Benchmark A was generated by converting parity games into Street games using 
standard methods, and as each Streett pair can be represented by a {0,1,2}- 
priority parity game, we represented the complete Streett objective as a con- 
junction of multiple {0,1,2}-priority parity objectives, resulting in a generalized 
parity game. Benchmark B was generated by adding randomly® generated parity 
objectives to given parity games. We considered 200 examples in Benchmark A 
and more than 1400 examples in Benchmark B. 

We summarize the complete set of results of the experiments in® Table 1 and 
Fig. 1. We performed two kinds of experiments. First, we solved every generalized 
parity game in Benchmark A and B in one shot using the different methods. 
The results are shown in Tablel(top) and Fig. 1(left). Although the average 
time taken by PESTEL is higher than GENZIEL and one partial solver, it is 
fastest in more than 90% of the games in both benchmarks. Thus, it shows that 
PESTEL is as efficient as the other methods in most cases. Moreover, for every 


5 The random generator takes three parameters: game graph “G”, number of objectives 
“k”, and maximum priority “m”; and then it generates “k” random parity objectives 
with maximum priority “m” as follows: 50% of the vertices in “G” are selected ran- 
domly, and those vertices are assigned priorities ranging from 0 to “m” (including 0 
and m) such that 1/m-th (of those 50%) vertices are assigned priority 0 and 1/m- 
th are assigned priority 1 and so on. The rest 50% are assigned random priorities 
ranging from 0 to “m”. Hence, for every priority, there are at least 1/(2m)-th vertices 
(i.e., 1/m-th of 50% vertices) with that priority. 

See the full version of this paper [5, Appendix C] for a version of Fig. 1 including all 
solvers considered in Table 1. 
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Fig. 4. Experimental results for parity games with faulty edges. Left: percentage of 
instances with conflicts given a certain percentage of faulty edges. Right: average per- 
centage of vertices that created conflicts given a certain percentage of faulty edges. 


game in both benchmarks, PES'TEL succeeded to compute the complete winning 
region, whereas the partial solvers failed to do so in some cases’. We note that 
the instances which are hard for PESTEL are those where the winning region 
becomes empty, which is quickly detected by GENZIEL but only seen by PES'TEL 
after most objectives are (separately) considered. 

Second, we solved the examples in Benchmark B by adding the objectives 
one-by-one, i.e., we solved the game with one objective, then we added one more 
objective and solved it again, and so on. The results are shown in Table 1(bottom) 
and Fig. 1(right). As PESTEL can use the pre-computed strategy templates if we 
add a new objective to a game, it outperforms all the other solvers significantly 
as they need to re-solve the game from scratch every time. 


Fault-Tolerant Control. As discussed in Sect. 5.2, strategy templates can be 
used to implement a fault tolerant time-dependent strategy, if the set of faulty 
edges F does not cause conflicts with the strategy template. We have used PES- 
TEL on over 200 examples of parity games from SYNTCOMP [21] to evaluate 
the relevance of such conflicts in practice. For this, we randomly selected different 
percentages of edges to be faulty and checked for conflicts with the given tem- 
plate. The results are summarized in Fig. 4. The left plot shows the number of 
instances for which a conflict occurs if a certain percentage of randomly selected 
edges is faulty. We see that the majority of the instances never faces a conflict 
even when 30% of the edges are faulty. Looking more closely into the instances 
with conflicts, Fig. 4(right) shows the average number of conflicting vertices in 
these benchmarks. Here we see that conflicts occur very locally at a very small 
number of vertices. Our strategy templates allow for a linear-time algorithm to 
localize them, allowing to mitigate them in practice by additional hardware. 


Remark 1. We remark again that our results are directly applicable to CPS 
with continuous dynamics via the paradigm of abstraction-based control 
design (ABCD). In particular, standard abstraction tools such as SCOTS [35], 


7T Additionally, we outperform all algorithms on the benchmarks considered by Bruyère 
et al. [12]. We have however chosen to not include them in our analysis as many of 
their generalized parity games have only one objective and are therefore trivial. 
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ARCS [13], MASCOT [20], P-FACES [20], or ROCS [27] automatically compute 
a game graph from the (stochastic) continuous dynamics that can directly be 
used as an input to PESTEL. The winning strategy computed by PESTEL can 
further be refined into a correct-by-construction continuous feedback controller 
for the original dynamical system using standard methods from ABCD. We leave 
these tool integrations to future work. 
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Abstract. Data scientists often need to write programs to process pre- 
dictions of machine learning models, such as object detections and trajec- 
tories in video data. However, writing such queries can be challenging due 
to the fuzzy nature of real-world data; in particular, they often include 
real-valued parameters that must be tuned by hand. We propose a novel 
framework called QUIVR that synthesizes trajectory queries matching a 
given set of examples. To efficiently synthesize parameters, we introduce 
a novel technique for pruning the parameter space and a novel quanti- 
tative semantics that makes this more efficient. We evaluate QUIVR on 
a benchmark of 17 tasks, including several from prior work, and show 
both that it can synthesize accurate queries for each task and that our 
optimizations substantially reduce synthesis time. 


1 Introduction 


Over the past decade, deep neural networks (DNNs) have successfully solved 
challenging artificial intelligence problems [47,70]. Abstractly, these models can 
be thought of as providing interfaces to real-world data—e.g., they can pro- 
vide object classes [30,47], detections [59,60], and trajectories [10,11,83]. Then, 
these predictions are processed by programs, e.g., to identify driving patterns [5], 
events in TV broadcasts [28], or animal behaviors [67]. 

However, writing such programs can be challenging since they must still 
account for the fuzziness of real data. To do so, these programs typically include 
real-valued parameters that need to be manually tuned by the user. For exam- 
ple, consider a query over car trajectories designed to identify instances where 
one car turns in front of another. This query must capture the shape of the 
trajectory of both the turning car and the car crossing the intersection. In addi- 
tion, the user must select the appropriate maximum duration from the first car 
changing lanes to the second car crossing the intersection. Even an expert would 
require significant experimentation to determine good parameter values; in our 
experience, it can take up to an hour to tune the parameters for a single query. 


Appendices are available in the technical report [51]. 
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We focus on programs that query databases of trajectories output by an 
object tracker [5,7,8,28,40—42, 54]. Given a video, the tracker predicts the posi- 
tions of objects in each frame (e.g., cars, people, or mice), as well as associ- 
ations between detections of the same object across successive frames. Appli- 
cations often require subsequent analysis of these trajectories. For example, in 
autonomous driving, when a risky scenario is encountered, engineers typically 
search for additional examples of that driving pattern to improve their plan- 
ner [63,64,66]—e.g., cars driving too close [82] or stopping in the middle of the 
road [6]. Object tracking has also been used to track robots [58,81], animals for 
behavioral analysis [12,67,75], and basketball players for sports analytics [67,85]. 

We propose an algorithm for synthesizing queries over object trajectories 
given just a handful of input-output examples. A query takes as input a repre- 
sentation of a trajectory as a sequence of states (e.g., position, velocity, and accel- 
eration) in successive frames of the video, and outputs whether the trajectory 
matches its semantics. Our query language is based on regular expressions—in 
particular, a query is a composition of a user-extensible set of predicates using 
the sequencing, conjunction, and iteration operators. For instance, trajectories 
might correspond to cars in a video; Fig. 1 shows a query for identifying cars turn- 
ing at an intersection. As we discuss in Sect. 6, the full query language semantics 
is rich enough to subsume (variants of) Kleene algebras with tests (KAT) [46] 
and signal temporal logic (STL) [50]; however, such generality is seldom needed, 
so we use a pared-down query language that works well in practice. 

Our algorithm performs enumerative search over the space of possible queries 
to identify ones that are consistent with the given examples. A key challenge in 
our setting is that our predicates have real-valued parameters that must also be 
synthesized. Thus, our strategy enumerates sketches, which are partial programs 
that only contain holes corresponding to real-valued parameters. For each sketch, 
we search over the space of real-valued parameters, while using an efficient prun- 
ing strategy to reduce the search space. At a high level, we use a quantitative 
semantics to directly compute “boundary parameters” at which a given exam- 
ple switches from being labeled positive to negative. Then, depending on the 
target label, we can prune the entire region of the search space on one side of 
these boundary parameters. We prove that this synthesis strategy comes with 
soundness and (partial) completeness guarantees. 

We implement our approach in a system called QUIVR.! Our implementa- 
tion focuses on videos from fixed-position cameras. While our language and 
synthesis algorithm are general, the predicates we design are tailored to specific 
settings. We evaluate QUIVR on identifying driving patterns in traffic videos, 
including ones inspired by recent work on autonomous driving [63,64,66], on 
behavior detection in a dataset of mouse trajectories [72], and on a synthetic 
task from the temporal logic synthesis literature [44]. We demonstrate how both 
our parameter pruning strategies and our query evaluation optimizations lead 
to substantial reductions in the running time of our synthesizer. 


1 Quiver stands for QUery Induction for Video tRajectories. 
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(InLanel) ; (Any) ; (InLane2) 


Fig. 1. (a) A video frame from a traffic camera, along with object trajectories (red) and 
manually annotated lanes (black). (b) The trajectories selected by the query (bottom), 
which selects cars turning at the intersection. (Color figure online) 


In summary, our contributions are: 


— A language for querying object trajectories (Sect.3) and an algorithm for 
synthesizing such queries from examples (Sect. 4). 

— An efficient parameter pruning approach based on a novel quantitative seman- 
tics (Sect.4), yielding a 5.0x speedup over the state-of-the-art quantitative 
pruning technique from the temporal logic synthesis literature. 

— Animplementation of our approach in QUIVR, and an evaluation of QUIVR on 
identifying driving behaviors in traffic camera video and mouse behaviors in 
a dataset of mouse trajectories (Sect.5), demonstrating substantially better 
accuracy than neural network baselines. 


2 Overview 


We consider a hypothetical scenario where an engineer is designing a control 
algorithm for an autonomous car and would like to identify certain driving pat- 
terns in video data. We show how they can use our framework to synthesize a 
query to identify car trajectories that exhibit a given behavior. 


Video Data. Traffic cameras are a rich source of driving behaviors [5, 13,61]; 
one dataset used in our evaluation is YTStreams [7], which includes video from 
several such cameras. Figure l(a) shows a single frame from such a video; we 
have used an object tracker [83] to identify all car trajectories (in red). 


Predicates. QUIVR assumes it is given a set of predicates that match portions of 
trajectories exhibiting behaviors of interest; during synthesis, it considers queries 
composed of these predicates. In Fig. 1(a), the engineer has manually annotated 
the lanes of interest in this video (black), to specify four InLaneK predicates that 
select trajectories of cars driving in each lane K visible in the video. Predicates 
may be configured by real-valued parameters. For example, 


(InLanel) A (DispLt,) 
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((Intaneu()) ; (Any) ; (Intane2()) ) A (InLane2(B) ) 


Fig. 2. A single match (top) for the multi-object query (bottom) which captures one 
car, A, turning into a lane behind another car, B, that is in that lane. The trajectories 
change color from red to green as a function of time. As can be seen, the car making 
the right turn does so just after the car going straight passes through the intersection. 
(Color figure online) 


searches for trajectories where the car stays in lane 1 for a period of time and 
the car has a displacement at most 0 between the beginning and end of that 
period. Note that atomic predicates, like (DispLty), can match multiple time- 
steps, whereas in formalisms like regular expressions and temporal logic, atomic 
predicates are over single time-steps. A key feature of our framework is that the 
set of available predicates is highly extensible, and the user can provide their 
own. See Sect. 5.1 for the predicates we use in our evaluation. 


Synthesis. To specify a driving pattern, the engineer provides a small number of 
initial positive and negative examples of trajectories; then, QUIVR synthesizes 
a query that correctly labels these examples. In Fig. 1(b), we show the result of 
executing the query shown, which is synthesized to identify left turns in the data. 
Often, there are multiple queries consistent with the initial examples. While it 
may be hard for users to sift through the video for positive examples, it is usually 
easy for them to label a given trajectory. Thus, to disambiguate, QUIVR asks 
the user to label additional trajectories [19,36,62]. 


Multi-object Queries. So far, we have focused on queries that identify trajectories 
by processing each trajectory in isolation. A key feature of our framework is that 
users can express queries over multiple trajectories—for example, 


((InLanel(B)) A (ChangeLane2To1(A))) ; (InFront(A, B)). 


This query says that car B is in lane 1 while car A changes from lane 2 to lane 
1, and car A ends up in front of car B. Note that the predicates now include 
variables indicating which object they refer to, and the predicate InFront(A, B) 
refers to multiple objects. An example of a pair of trajectories selected by a 
multi-object query is shown in Fig. 2. 
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3 Query Language 


We describe our query language for matching object trajectories in videos. Our 
system first preprocesses the video using an object tracker to obtain trajectories, 
which are sequences z = (£o, £1, .--,&n—1) of states z; E€ X. Then, a query Q in 
our language maps each trajectory z to a value B = {0,1} indicating whether it 
matches z. Our language is similar to both STL and KAT. One key difference is 
that predicates are over arbitrary subsequences of z rather than single states zx. 
In the main paper, we consider a simpler language, but in Appendix A we show 
how it can be extended to subsume both STL and KAT. 


Trajectories. We begin by describing the input to a query in our language, which 
is the representation of one or more concurrent object trajectories in a video. 

Consider a space S corresponding to a single object detection in a single 
video frame—e.g., s € S C R® might encode the 2D position, velocity, and 
acceleration of s in image coordinates. When considering m concurrent objects, 
let the space of states X = S™, and then a trajectory z E€ Z = X* is a sequence 
z = (X0,%1,...,Ln-1) of states of length |z| = n. We use the notation zij = 
(Zi, Zi41, ++ Zj—1) to denote a subtrajectory of z. 


Predicates. We assume a set of predicates @ is given, where each predicate y € ® 
matches trajectories z € Z; we use sat,(z) € B = {0,1} to indicate that » 
matches z. As discussed below, queries in our language compose these predicates 
to match more complex patterns. 

Next, predicates in our language may have real-valued parameters that must 
be specified. We denote such a predicate y with parameter 0 € R by yg. To enable 
our synthesis algorithm to efficiently synthesize these real-valued parameters, we 
leverage the monotonicity in all such predicates we have used in our queries. In 
particular, we assume that the semantics of these predicates have the form 


[vel] (z) = 1e,(z) = 4), 


where tọ : Z — R is a scoring function. We also assume that the range of tọ 
is bounded (which can be achieved with a sigmoid function, if necessary). For 
example, for the predicate DispLty, we have wpispit(z) = —||Zo — Zn—1\|. Thus, 
LDispLt(z) > 0 says the total displacement is at most —@. We describe the predi- 
cates we include in Sect. 5.1; they can easily be extended. 


Syntax. The syntax of our language is 
Q:=9/2;Q/Q|QAQ, 


where Q* = Q; Q; ...; Q (k times). That is, the base case is a single predicate y, 
and queries can be composed using sequencing (Q; Q) and conjunction (QA Q). 
Operators for disjunction, negation, Kleene star, and STL’s “until” are discussed 
in Appendix A.2. We describe constraints imposed on our language during syn- 
thesis in Sect. 4.7. 


Semantics. The satisfaction semantics of queries have type [|] : Q > Z > 
where Q is the set of all queries in our language, Z is the set of trajectories, and 
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[el(z) = saty(z) 
[Q1 A Q212) = [Qi] (2) A [Q2](2) 


[Q1; Q2](z) = V [Qi] (Zo:e) A [Q2] (zrn) 


k=0 


Fig. 3. Satisfaction semantics of our query language; z € Z is a trajectory of length n 
and ọ € @ are predicates. Iteration (Q*) can be expressed as repeated sequencing. 


= {0,1}. In particular, [Q](z) € B indicates whether the query Q matches 
trajectory z. The semantics are defined in Fig. 3. The base case of a single predi- 
cate y checks whether y matches z; conjunction Q1 AQ» checks if both conjuncts 
match; and sequencing Qı ; Q2 checks if z can be split into z = z0.42Z%:n in a way 
that Qı matches zo: and Q2 matches Zk:n.- The semantics can be evaluated in 
time O(|Q|- n?). 


4 Synthesis Algorithm 


We describe our algorithm for synthesizing queries consistent with a given set 
of examples. It performs a syntax-guided enumerative search over the space of 
possible queries [3]. In more detail, it enumerates sketches, which are partial 
programs where only parameter values are missing. For each sketch, it uses a 
quantitative pruning strategy to compute the subset of the input parameters for 
which the resulting query is consistent with the given examples. A key contribu- 
tion is how our algorithm uses quantitative semantics for quantitative pruning. 


4.1 Problem Formulation 


Partial Queries. A partial query is in the grammar 


Q:= 7? | er | ¢|1Q;Q/Q*|QaQ. 


Note that there are two kinds of holes: (i) a predicate hole h = ?? that can be 
filled by a sub-query Q, and (ii) a parameter hole h = 77 that can be filled by a 
real value 6;, € R. We denote the predicate holes of Q by H,(Q), the parameter 
holes by 7o(Q), and let H(Q) = Hy(Q) U Ho(Q). A partial query Q is a sketch 
(denoted Q € Qsketcn) [71] if H,(Q) = Ø, and is complete (denoted Q € Q) 
if H(Q) = Ø. For example, for Q = (DispLt»2,)A??2, we have H»(Q) = {??1} 
and Hy,(Q) = {??2}. (We label each hole h = ??i with an identifier i € N to 
distinguish them.) 


Refinements and Completions. Given query Q € Q, predicate hole h € H,(Q), 
and production R = Q > f(Q1,...,Qx) we can fill h with R (denoted Q’ = 
fill(Q,h, R)) by replacing h with f(??1,...,??k), where each ??i is a fresh hole, 
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and similarly given a parameter hole h € Hg(Q) and a value 0, € R. We call Q’ 
a child of Q (denoted Q — Q’). Next, we call Q” a refinement of Q (denoted 
Q Š Q") if there exists a sequence Q — ... — Q”; if furthermore Q” € Q, we 
say it is a completion of Q. For example, we have 


221 — 222; 223 — (InLanel) ; ??3 —.... 


Here, (InLanel) ; ??3 is a child (and refinement) of ??2; ??3 obtained by filling 
??2 with Q —> (InLane1)—i.e., 


(InLanel) ; ??3 = fill(??2; ??3,??2, Q — (InLane1)). 
Parameters. We let 0 € R'%e(@)| denote a choice of parameters for each h € 
Ho(Q), let 0, € On C R denote the parameter for hole h, and let Qg denote the 


query obtained by filling each h € Ho (Q) with 8n. Note that if Q E€ Qsketcn, then 
Qo € Q is complete. For example, consider the sketch 


Q = (DispLt»,,) A (MinLength>.). 


This query has two holes, so its parameters are 6 € R?. If 0 = (3.2,5.0), then 
0771 = 3.2 is used to fill hole ??1 and 0272 = 5.0 is used to fill ??2. In particular, 


Qo = (DispLt3 5) A (MinLength; ,). 


Query Synthesis Problem. Given examples W C W = Z x B, where B = {0,1}, 
our goal is to find a query Q € Q that correctly labels these examples—i.e., 


iwQ= A (el) =v). 


(z,y)eW 


Thus, Yw (Q) indicates whether Q is consistent with the labeled examples W. 
Our goal is to devise a synthesis algorithm that is sound and complete—i.e., it 
finds a query that satisfies Yw (Q) = 1 if and only if one exists. 


4.2 Algorithm Overview 


Our algorithm enumerates sketches Q € Qsketch; for each one, it tries to compute 
parameter values 0 such that the completed query Qo is consistent with W—.e., 
Yw (Qo) = 1. It can either stop once it has found a consistent query, or identify 
additional queries that are consistent with W. Algorithm 1 shows this high-level 
strategy—at each iteration, it selects a sketch Q, determines a region B of the 
parameter space containing consistent parameters 0 € B, and adds (Q, B) toa 
list of consistent queries that solve the synthesis problem. 

The key challenge is searching over the space of continuous parameters 0 
for a given sketch Q such that Qe is consistent with W. For efficiency, we rely 
heavily on pruning the search space. At a high level, consider evaluating a single 
candidate parameter 0 on a single example (z,y) € W—i.e., check whether 
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Algorithm 1. Synthesizes consistent queries using the subroutine in Algorithm 2 
1: procedure SYNTHESIZEQUERY(W) 


2: Qcon — Ø 

3 for Q = Osketch do 

4: B <— SynthesizeParameters(W, Q) 
5 Qeon — {(Q, B)} 

6 return Qeon 


[Qo] (z) = y. If this condition does not hold, then we can not only prune 0 from 
the search space, but also a significant fraction of additional candidates. For 
instance, suppose [Qe] (z) = 1 but y = 0; if 6’ < 0 (in all components), then by 
a monotonicity property we prove for our semantics, we also have [Qe ] (2) = 1. 
Thus, we can also prune 6’. 

Previous work has leveraged this property to prune the search space [49,53, 
78]. Using a strategy based on binary search, for a given example (z, y) E€ W, we 
can identify “boundary” parameters 0 to accuracy £ in O(log(1/e)) steps—i.e., 
compute 0 for which [Qg9_<](z) = 1 and [Qo+e](2) = 0. 

Our algorithm avoids this binary search process, which can lead to a signifi- 
cant speedup in practice. The key idea is to devise a quantitative semantics for 
queries that directly computes 9; in fact, this quantitative semantics is closely 
related to robust temporal logic semantics, where the conjunction and disjunc- 
tion of the satisfaction semantics are replaced with minimum and maximum, 
respectively. 


4.3 Pruning with Boundary Parameters 


We begin by describing how “boundary parameters” can be used to prune a 
portion of the search space over parameters. First, for any candidate parameters 
6, we can prune parameters 6’ < 0 (if [Qe](z) = 1 and y = 0) or & > 0 (if 
[Qo](z) = 0 and y = 1). Pruned regions of the parameter space take the form of 
hyper-rectangles, which we call boxes. For convenience, let œ := (00,..., 00). 


Definition 1. Given x,y € R¢, where R = R U {+00}, a box is an axis-aligned 
half-open hyper-rectangle |x, y] := {v | aj < vi < yi} C R3. 


The key property ensuring that parameters prune boxes of the search space is 
that the semantics are monotonically decreasing in 0. 


Lemma 1. Given sketch Q, trajectory z, and two candidate parameters 0, 0' € 
Rê such that 0 < 0’ component-wise, we have [Qe] (z) > [Qe] (2). 


The proof follows by structural induction on the query semantics: the base case 
follows since the semantics 1(1,(z) > 0x) for predicates is monotonically decreas- 
ing in 0k, and the inductive case follows since conjunction and disjunction are 
monotonically increasing in their inputs (so they are also monotonically decreas- 
ing in 0x). Below, we show how monotonicity ensures that we can prune whole 
regions of the search space if we find boundary parameters. 
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As an example, suppose we have two trajectories, zo of a car driving quickly 
and then slowly, and zı of a car driving slowly and then quickly, and that we are 
trying to synthesize a query for W = {(zo,0), (21, 1)}. For simplicity, we assume 
both zo = (0.9,0.6) and zı = (0.5,0.8) have just two time steps each, with just 
a single component representing velocity. Furthermore, we assume there is just 
a single predicate (VelGtg) matching time steps where the velocity is at least 0, 
where @ is a real-valued parameter. Since (VelGtg) matches single time steps, 
the satisfaction semantics is 0 except on trajectories of length 1, so: 


tveiat((20)o0:1) = 0.9 tveiat ((Z0)1:2) = 0.6 tveiat ((Z)a:) 
éveiat((21)0:1) = 0.5 tveiat ((21)1:2) = 0.8 iveiat ((Z)o0:2) = 


Consider the sketch Q = (VelGt771); (VelGt?72). We can see that the candidate 
parameters (0.5, 0.6) satisfy [Q(0.5,0.6)] (1) = 1: 


| 


2 
[Q(0.5,0.6)]((41) on) =VI (VelGto.5)] ((21)o:n) A [(VelGto.6)]((21) een) 
pon 


| 


= [(VelGto.5)]((21)o:1) A [(VelGto.6}]((21)1:2) 
= 1(0.5 > 0.5) A 1(0.8 > 0.6) 
=i, 


where the second equality holds because (VelGtg) matches only length-1 trajec- 
tories, so the k = 0 and k = 2 cases evaluate to 0. Since the semantics are mono- 
tonically decreasing, we have [Qe] (21) = 1 for any 8 € |(—oo, —oo), (0.5, 0.6)]. 

Notice, however, that if we were to move any Ẹ > 0 upward, we would have 
[Q(0.5+e1,0.6+e2)](21) = 1(0.5 > 0.5 + £1) A 1(0.8 > 0.6 + £2) = 0. So we know 
[Qe] (21) = 0 for any 8 € |(0.5,0.6), (00, 00)|. This is because (0.5, 0.6) lies on the 
boundary between {0 | [Qe ](z) = 0} and {0 | [Qe ](z) = 1}. This boundary 
plays a key role in our algorithm. 


Definition 2. Given a sketch Q with d parameter holes and a trajectory z, we 
say 0 € R?U {L, T} is a boundary parameter if one of the following holds: 


E€ R! and [Qo] (z) = 1, but [Qo ](2) = 0 for all 6 € [0,3] 


0 
- 0 = L and [Qo] (2) = 0 for all &’ € |-%,%] 
0 = T and [Qo] (2) = 1 for all 0’ € |-œ, &] 


In the first case, by monotonicity, we also have [Qg’](z) = 1 for all 6’ € |-œ, 6]; 
thus, @ lies on the boundary between parameters 6’ where Qa: evaluates to 1 and 
those where it evaluates to 0. The second and third cases are where Qo always 
evaluates to 0 and 1, respectively. 

Given a boundary parameter 0 for an example (z,y) E€ W, we can prune 
|0,c] if y = 1 or |—od, 0] if y = 0. Intuitively, boundary parameters pro- 
vide optimal pruning along a fixed direction in the parameter space. Thus, our 
algorithm focuses on computing boundary parameters for pruning. 
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Fig. 4. (a) shows a boundary parameter, 6;, for z1, and a region that is inconsistent 
with zı and can be pruned (red), as well as a region that is consistent with it (blue). (b) 
similarly shows a boundary parameter ĝo for zo. (c) shows the pruning pair composed 
of @) and 61, a region consistent with both (blue), and regions inconsistent with either 
(red). (d) is the same as (c), but if 0o and 6; swapped places. The labels bo through bg 
denote analogous boxes in (c) & (d). (e) shows how, if (d) were the result of the first 
step of search and bg were chosen next, search could proceed. (f) shows ground truth 
consistent (blue) and inconsistent (red) regions that the search process in (d) & (e) 
might converge toward. (Color figure online) 


In Fig. 4(a), if 01 is a boundary parameter for z1, we know that the blue 
region satisfies z1, and thus is consistent with the label 1, while the red region 
dissatisfies z1, and thus is inconsistent with the label 1. Similarly, in Fig. 4(b), if 
ĝo is a boundary parameter for zo, we know that the red satisfies z1, and thus is 
inconsistent with the label 0, while the blue dissatisfies zo, and thus is consistent 
with the label 0. 


4.4 Pruning with Pairs of Boundary Parameters 


To extend pruning to the entire dataset W, we could simply prune the union of 
the individual pruned regions for each (z, y) € W. However, one important fea- 
ture of our approach is that we can also establish regions of the parameter space 
where the parameters are guaranteed to be consistent with W. To formalize this 
idea, we introduce the concept of a “pruning pair”, which is a pair of boundary 
parameters which might allow us to find such a consistent region. 


Definition 3. Given a sketch Q and a dataset W, a pair of boundary parameters 
6-,0* e R4U{1,T} is a pruning pair for Q and W if all of the following hold: 


— 0* is a boundary parameter for some z € W+ and, for all z’ € WT such that 
z' # z, we have [Qp+](2’) = 1. 

— 07 is a boundary parameter for some z € W7 and, for all z’ € W7 such that 
z! # z, we have [Q,-](z’) = 0. 

- 0 < 0t orf > 0t. 


If 07 <4", the pruning pair (07, 0*) is consistent, and inconsistent otherwise. 


Our algorithm searches for pruning pairs along a fixed direction—i.e., it considers 
a curve L C R? and looks for the following pruning pair along L: 


IN janie =a}, mane{oes N ee =o} 


6* = sup fo EL 
zEeEW+ zEeEW- 
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Intuitively, @* is the largest 0 that correctly classifies all positive examples, and 
conversely for 07. We restrict to curves L that are monotonically increasing in 
all components, in which case the supremum and infimum are well defined since 
L comes with a total ordering (from its smallest point to its largest) that is 
consistent with the standard partial order on R¢. Then, (07,0%) form a pruning 
pair: since §* is a boundary parameter for z, if we take 6* to be any larger, then 
we must have [Qo](z) = 0 for some z € WT, and similarly for 07. 

Given a curve L, we can compute an approximation to 0+ and 07 via binary 
search. However, our algorithm avoids the need to do so by directly computing 
0* and @~ using a quantitative semantics, which we describe in Sect. 4.6. 

Figure 4(c) shows how the pair of boundary parameters 09 for zo and 0, for 
zı (where L is the diagonal line) prunes the parameter space. The blue region is 
guaranteed to be consistent with W, as it is the intersection of the region below 
6*, which must satisfy [Qe] (21) = 1, and the region above 6~, which must 
satisfy [Qe] (zo) = 0. Conversely, the red regions are inconsistent with either zo 
or 21, and therefore with W. Thus, the red regions can be pruned, whereas the 
blue regions are solutions to our synthesis problem. Note that the red region is 
the union of the red regions in Fig. 4(a) and (b), whereas the blue region is the 
intersection of the blue regions in Fig. 4(a) and (b). 

This pattern holds for any consistent pruning pair (07 < +); if instead 
the pair is inconsistent (9T > 6+), then the resulting pattern is illustrated in 
Fig. 4(d); in this case, we can prune the red regions as before, but there is no blue 
region of solutions. In general, for a d dimensional parameter space, a pruning 
pair divides the parameter space into 3% boxes (i.e., for each dimension, the 
box can be below, in line with, or above the center box). The regions below 
6— and above 0+ can be pruned, and the region between 07 and 6* (if one 
exists) contains synthesis solutions. Precisely, it follows from the definitions and 
monotonicity that: 


Lemma 2. Every 6 € |-%,07] and 0 € [0+ , co] is inconsistent with W, and 
every 0 € |0—,0*] is consistent with W. 


The remaining boxes need to be further analyzed by our algorithm. 


4.5 Pruning Parameter Search Algorithm 


Next, we describe Algorithm 2, which searches over the space of parameters to 
fill a sketch Q for a given dataset W. The algorithm uses a subroutine that takes 
a box and returns a pruning pair in that box, which we describe in Sect. 4.6. 
Given this subroutine, our algorithm maintains a work-list of “unknown” boxes 
(i.e., unknown whether parameters in these boxes are consistent or inconsistent 
with W). At each iteration, it pops a box from the work-list (in first-in-first-out 
order), uses the given subroutine to find a pruning pair inside that box, applies 
the pruning procedure described in the previous section, and then adds each new 
unknown box to the work-list. 

For the last step, the current box b is divided into 3% smaller boxes. The 
box beenter ‘= |min{07, 0+}, max{07,0*}] is pruned (added to Bin.) if the 
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Algorithm 2. Synthesizes consistent parameters for a given sketch 


1: procedure SYNTHESIZEPARAMETERS(W, Q) 
2: Beon + Ø; Bine — Ø, Bunk + {binitia1 } 


3: for i € {1,...,N} do 

4: b — Pop( Bunk) 

5: 6~,0* — ComputePruningPair(W, Q, b) 

6: bcenter — |min{@~, 0+}, max{07, 0%} ] 

T: blower, bupper, Bincomp, Bextra faz SplitBox(b, Deenter ) 
8: if 07 <6* then 

9: Beon — Boon U {Bcenter } 
10: Bine ii Bine U {biower, bupper } 
11: Bunk — Bunk U Bincomp U Bextra 
12: else if 07 > 07 then 
13: Bine <— Bine U {bcenter, blower, bupper } U Bextra 
14: Bunk — Bunk U Bincomp 
15: return Boon 


pair (07,0) is inconsistent, and contains solutions to the synthesis prob- 
lem otherwise (added to Boon). The boxes Diower = |-œ, min{07,0t}] and 
bupper = [max{O~, 0+}, 0d] are always pruned. The boxes b € Bincomp are the 
remaining corners of b, and always have indeterminate consistency (added to 
Bunk). The remaining boxes b € Bextra are indeterminate if (07, 0®) is consis- 
tent, and inconsistent otherwise. In our example, if the first step of the algorithm 
yielded Fig. 4(d), then the second step might pop bg and yield Fig. 4(e). 
The following soundness result follows directly from Lemma 2. 


Theorem 1. In Algorithm 2, every 0 € Beon is consistent with W for Q, and 
every 0 € Binec inconsistent. 


In addition, the algorithm is complete for almost all parameters: 


Theorem 2. The Lebesgue measure of {0 € b |b E€ Bunk} > 0 as N > œ. 


See Appendix D.1 for the proof. In other words, all parameters outside a sub- 
set of measure zero are eventually classified as consistent or inconsistent; intu- 
itively, the parameters that may never be classified are the ones along the deci- 
sion boundary. This result holds since at any search depth, the fraction of the 
parameter space pruned can be lower-bounded. 


4.6 Computing Pruning Pairs via Quantitative Semantics 


The pruning algorithm depends on the ability to compute, given a box b, a 
pruning pair (07,0*) on the restriction of the parameter space to b. Recall 
that 0t must be a boundary parameter for some zt € W* and must satisfy 
[Qo+](z) = 1 for all other z € W+, and 6~ must be a boundary parameter for 
some z~ € W~, and must satisfy [Q»-](z) =0 for all other z € W7. 

Given a box b = | fmin, Omax], our algorithm takes L C R4 to be the diagonal 
from Omin to Omax and computes the pruning pair along L. We can naively 
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ga = by(z) — vi 
[y27i]0,u(2) = ü: 
lolt a(z) = oo ~~) if sat, (z) = 1 
_ —oo if sat,(z) = 0. 


[Q1 A Q2 vag) = min{[Q: valz) [Q2]3,u(z)} 
1Q1; Q2] =e min{ [Q1 ]v u (20:x); [Q2]0,u(Zk:n)} 
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Fig. 5. The quantitative semantics of our language, taking a sketch Q, trajectory z, 
parameter v € Rt, and positive vector u € Rty; n is the length of z. 


use binary search: for 0t, we search for the parameters where A ew-+[Qe](2) 
transitions from 0 to 1, and similarly for 6~ and Azew- 7[Qo](z). 

Instead, by leveraging a quantitative semantics, we can directly compute 
6+ and 07, thereby reducing computation time substantially. Given a sketch 
Q, trajectory z, parameter v € R?, and positive vector u € R29, we devise a 
quantitative semantics [Q]4 „ (z) € R such that the parameter [Q]v,u(2) u +v 
is a boundary parameter. Intuitively, this semantics computes, starting at v, 
how many u-sized steps must be taken to reach the boundary. (For the uses 
in our algorithm, the number of steps is always in [0,1].) Then, for a box b = 
|Omin; Qmax |, we can take v = Onin and u = Omax — Omin, and compute 


pt = (min OR) wo, 0 = ( max fQI8,,(2)) + v 


We define the quantitative semantics in Fig. 5. The base case of ọọ adjusts and 
rescales t by v and u, and the other cases replace conjunction and disjunction in 
the satisfaction semantics with minimum and maximum. We have the following 
key result (where o0: u := T, -co-w:=1, T +v := T, and L +v := L): 


Theorem 3. For a sketch Q, trajectory z, parameter v € RÊ, and positive vector 
u € R29, we have that [Q] u(2)-u +v is a boundary parameter of z for Q. 


See Appendix D.2 for the full proof. For intuition, consider Omin = 0 
(i.e., the current box b C Rê is the unit hypercube). Then, v = 
so [Q]} u reduces to the standard max-min quantitative semantics for temporal 
logic [25]. 

Now, if we consider the satisfaction semantics of a base predicate [y¥o,] = 
1(ty(z) > 0i), then the value of 6; where the sementics flips is just u,(z). So 
any parameter with i-th component 1,(z) is a boundary parameter, and since 
L has the same slope in all dimensions, the boundary parameter along L is 
to(z)-1+0= [yr] (e) -T+0. 

In the inductive cases, it suffices to show that we can replace conjunction and 
disjunction with minimum and maximum in the semantics. Since the satisfaction 
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semantics is monotonically decreasing, as we move upward along L, at some point 
we will transition from 1 to 0. A conjunction becomes 0 when either conjunct 
becomes 0, so the transition will occur when we hit the first of the conjuncts’ 
transition points (their minimum). Dually, a disjunction becomes 0 when both 
disjuncts become 0, so we will transition at the last of the disjuncts’ transition 
points (their maximum). 

Finally, the intuition behind u and v is that they “preprocess” the parameters 


so that we evaluate along the diagonal of the current box instead of 6. i]. 


4.7 Implementation 


We implement our approach in a system called Quiver. It begins by running 
Algorithm 1 on a small number of labeled examples. 


Active Learning. With a small number of examples, there are typically many 
queries that are consistent with the labels, and yet which disagree on the labels of 
the remaining data. To disambiguate, we use an active learning strategy, asking 
the user to label specific trajectories that we choose, which are then added to 
our set of labeled examples. Queries that are not consistent with the new label 
are discarded. The labeling process continues until the set of consistent queries 
agrees on the labels of all unlabeled data. 

When choosing the trajectory z* to query the user for next, we select the 
one on which the set of consistent queries C' disagrees most—.e., 


"ORE 


z* = arg min 
2EZ 


el 


where 


J(2) = |C} XO 1 (Ven (Qo)) 


QoEC 
is the fraction of consistent queries that predict a positive label for trajectory z. 


Search Implementation. In some cases, searching for consistent parameters may 
take a very long time. To improve performance, we impose a timeout: for each 
sketch, we pause search if either: (i) we find some consistent box of parameters 
or (ii) we’ve exceeded 25 steps. In both cases, we save the sets of consistent, 
inconsistent, and unknown boxes. At each step of active learning, the newly 
labeled example may render previously consistent parameters inconsistent, so 
we mark all consistent boxes as unknown. We then resume search, again until 
(i) we find some consistent box (which may be the same one we had before), or 
(ii) we again exceed 25 steps. 

Note that while this timeout may cause us to query the user more often than 
is strictly necessary, it does not affect either the soundness or completeness of 
our approach, as we continue search after querying the user. 


Complete Query Selection. Active learning and evaluation of F scores (in Sect. 5) 
both require complete queries with specific parameters, rather than sketches 
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Table 1. The predicates used for the YTStreams dataset. 


Predicate Description 


InLaneK(A) Whether, at every time-step in the interval, object A is sufficiently 
close to the annotated curve for lane K and A’s movement direction 
is sufficiently in line with the curve for K. 


DurationNotShort Whether the interval spans at least 5 seconds. 
AvgAccelGtg Whether the average acceleration over the interval is at least 0. 
DistanceLtg Whether, at every time-step in the interval, the distance between the 


two objects is less than 6. 


SpeedRatioGt,(A, B) | Whether, at every time-step in the interval, the speed of A divided 
by the speed of B is at least 0. 


DispLt,(A) Whether the distance between the position in the first frame of the 
interval and the position in the last frame is less than 6. 


with boxes of parameters. Since the set C of consistent queries is infinitely large, 
we instead we use one query for each sketch that is known to have consistent 
parameters (sketches where search timed-out are thus not included). For those 
sketches, we pick the middle of the box of known-consistent parameters. 


5 Evaluation 


We demonstrate how our approach can be used to synthesize queries to solve 
interesting tasks: in particular, we show that (i) given just a few initial examples, 
it can synthesize queries that achieve good performance on a held-out test set, 
and (ii) our optimizations significantly reduce the synthesis time. 


5.1 Experimental Setup 


Datasets. We evaluate on two datasets of object trajectories: YTStreams [7], 
consisting of video and extracted object trajectories from fixed-position traffic 
cameras, and MABe22 [72], consisting of trajectories of up to three mice interact- 
ing in a laboratory setting. We also evaluate on a synthetic maritime surveillance 
task from the STL synthesis literature [44]. On YTStreams, we use two traffic 
cameras, one in Tokyo and one in Warsaw, and we consider single cars or pairs 
of cars. On MABe22, we consider pairs of mice. For the predicates used, see 
Table 1 for YTStreams, Appendix Table5 for MABe22, and Appendix Table6 
for maritime surveillance. 


Tasks. On YTStreams, we manually wrote 5 ground truth queries. Several queries 
apply to multiple configurations (e.g., different pairs of lanes), resulting in 10 
queries total (tasks H-Q in Table 2). The real-valued parameters were chosen 
manually, by visually examining whether they were selecting the desired tra- 
jectories. These queries cover a wide range of behaviors; for instance, they can 
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Table 2. Ground-truth queries for the YTStreams dataset. “IDs” indicates which 
tasks are instances of a given query. Multiple instantiations correspond to different 
lanes being used for “lane 1” and “lane 2”. The first is a one-object Shibuya query, the 
second is a one-object Warsaw query, and the rest are two-object Warsaw queries. 


IDs Query 
H, I, J, K | (InLanel(A) ) ; (Any) ; (InLane2(A) 


Matches cars that turn, starting in lane 1 and ending in lane 2. 


L, M (InLanel(A)) ^ (AvgAccelGt(A) 2» A (DurationNotShort) 
Matches cars that accelerate for a significant period of time while in lane 1. 
N ((InLane1(4)) ; (Any) ; (InLane2()) ) A (InLane2(B)) 


Matches pairs of cars where car B is in lane 2 for the entire duration of A turning 
from lane 1 into lane 2. 


O, P (InLane1(4))} ^ (InLane2(B)) A (DurationNotShort) A (SpeedFactorGt( A, B))x 
Matches pairs of cars in parallel lanes, 1 and 2, where car A is going faster than car B 
for a significant period of time. 

Q (InLanel(A)) ^ (InLane2(B)) A (DurationNotShort) ^ (DistanceLt(A, B) }+ 


Matches pairs of cars in parallel lanes, 1 and 2, where the cars are close for a 
significant period of time 


Fig. 6. Trajectories selected by multi-object queries. Each image shows two objects; 
the color of each one changes from red to green to denote the progression of time. Left: 
Unprotected right turn into lane with oncoming traffic. Middle: Bottom car drives 
faster than the top one and passes it. Right: One car driving closely behind the other. 
(Color figure online) 


capture behaviors such as human drivers making unprotected turns, an impor- 
tant challenge for autonomous cars [64], as well as cars trying to pass [66]. We 
show examples of trajectories selected by three of our multi-object queries in 
Fig. 6. MABe22 describes 9 queries for scientifically interesting mouse behavior. 
We implemented the 6 most complex to use as ground truth queries (tasks A-F 
in Appendix Table 7). The maritime surveillance task has trajectory labels and 
so does not need a ground truth query (task G). 


Synthesis. For each task, we divide the set Z of all trajectories into a train 
set Ztrain and a test set Ztest, using trajectories in the first half of the video 
for training, and those in the second half for testing. We randomly sample a 
set of initial labeled examples W from Ztyain, with 2 samples being positive 
and 10 being negative, and then actively label 25 additional examples from 
Zirain. For YTStreams and MABe22, labels are from the ground truth query. 
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Table 3. F; score after n steps of active learning, with our algorithm for selecting tracks 
to label (“Q”), an active learning ablation (“R”), an LSTM (“L”), and a transformer 
(“T”). For Q and R, there may be many queries consistent with the labeled data, so 
the median F score is reported. Bold indicates best score at a given number of steps. 


0 Steps 5 Steps 10 Steps 25 Steps 
Q R L T Q R Ein IE Q R E IEP Q R L T 
0.69 | 0.69 | 1.00 | 0.74 | 1.00 | 1.00 | 1.00 | 0.74 | 1.00 | 1.00 | 1.00 | 0.74 | 1.00 | 1.00 | 1.00 | 0.74 
0.99 | 0.99 | 0.47 | 0.20 | 0.99 | 0.99 | 0.47 | 0.25 | 0.99 | 0.99 | 0.47 | 0.05 | 0.99 | 0.98 | 0.47 | 0.06 
0.96 | 0.96 | 0.38 | 0.09 | 0.99 | 0.96 | 0.38 | 0.08 | 0.99 | 0.96 | 0.38 | 0.02 | 0.99 | 0.98 | 0.38 | 0.01 
0.77 | 0.77 | 0.52 | 0.27 | 0.99 | 0.96 | 0.52 | 0.28 | 0.99 | 0.99 | 0.52 | 0.32 | 0.99 | 1.00 | 0.52 | 0.08 
1.00 | 1.00 | 0.44 | 0.29 | 1.00 | 1.00 | 0.44 | 0.14 | 1.00 | 1.00 | 0.44 | 0.13 | 1.00 | 1.00 | 0.44 | 0.07 
0.88 | 0.88 | 0.78 | 0.38 | 0.99 | 0.96 | 0.78 | 0.39 | 1.00 | 0.96 | 0.78 | 0.18 | 1.00 | 0.96 | 0.78 | 0.27 
0.68 |0.68 | 0.65 | 0.78 | 1.00 | 1.00 | 0.65 | 0.77 | 1.00 | 1.00 | 0.65 | 0.77 | 1.00 | 1.00 | 0.65 | 0.77 
0.30 | 0.30 | 0.12 | 0.22 | 0.34 | 0.34 | 0.13 | 0.23 | 0.92 | 0.92 | 0.13 | 0.22 | 0.92 | 0.92 | 0.13 | 0.37 
0.37 | 0.37 | 0.13 | 0.00 | 1.00 | 0.37 | 0.13 | 0.00 | 1.00 | 1.00 | 0.13 | 0.00 | 1.00 | 1.00 | 0.13 | 0.3 
0.07 |0.07 | 0.01 | 1.00 | 0.41 | 0.07 | 0.01 | 0.86 | 0.80 | 0.09 | 0.04 | 0.75 | 0.80 | 0.09 0.04 | 0.86 
K |0.28 | 0.28 | 0.15 | 0.00 | 0.99 0.27 | 0.15 | 0.00 | 0.99 | 0.99 | 0.15 | 0.00 | 0.99 | 0.99 | 0.15 | 0.00 
L | 0.67 | 0.67 | 0.07 | 0.37 | 0.96 | 0.88 | 0.07 | 0.42 | 0.96 | 0.88 | 0.07 | 0.08 | 0.96 | 0.88 | 0.07 | 0.3 
M | 0.92 | 0.92 | 0.10 | 0.37 | 0.99 | 0.92 | 0.10 | 0.46 | 0.99 | 0.92 | 0.10 | 0.00 | 0.99 | 0.92 | 0.10 | 0.18 
N | 0.60/ 0.60 | 0.02 | 0.00 | 0.20 | 0.09 | 0.02 | 0.00 |0.11 | 0.21 | 0.02 | 0.00 | 0.18 | 0.78 | 0.02 | 0.3 
0.11 | 0.11 | 0.01 | 0.04 | 0.50 |0.17 | 0.01 | 0.21 | 0.70 | 0.17 | 0.01 | 0.21 | 1.00 | 0.21 |0.01 | 0.00 
P | 0.16} 0.16 | 0.04 | 0.04 | 0.23 | 0.21 | 0.03 | 0.04 | 0.82 | 0.21 | 0.03 | 0.14 | 1.00 | 0.29 | 0.03 | 0.14 
Q | 0.07 | 0.07 | 0.02 0.02 | 0.16 | 0.12 | 0.01 | 0.31 | 0.92 | 0.12 | 0.01 | 0.18 | 1.00 | 0.12 | 0.01 | 0.20 
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For tractability, we limit search to sketches with at most three predicates, at 
most two of which may have parameters. In most cases, this excludes the ground 
truth from the search space. 


5.2 Accuracy of Synthesized Queries 


We show that QUIVR synthesizes accurate queries from just a few labeled exam- 
ples. We evaluate the F; score of the synthesized queries on Zest. Recall that 
our algorithm returns a list C of consistent queries; we report the median F3 
score across Q €E C. 


Baselines. We compare to (i) an ablation where we replace our active learning 
strategy with an approach that labels z uniformly at random from the remaining 
unlabeled training examples; (ii) an LSTM [16,33] neural network; and (iii) a 
transformer neural network [26,29,77]. Because neural networks perform poorly 
on such small datasets, we pretrain the LSTM on an auxiliary task, namely, 
trajectory forecasting [43]. Then, we freeze the hidden representation of the 
learned LSTM, and use these as features to train a logistic regression model 
on our labeled examples. The neural network baselines do active learning by 
selecting among the unlabeled trajectories the one with the highest predicted 
probability of being positive. 


Results. We show the F score of each of the 17 queries in Table 3 after 0, 5, 10, 
and 25 steps of active learning. After just 10 steps, our approach provides Fı 
score above 0.99 on 10 of 17 queries, and after 25 steps, it yields an F) score 
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Table 4. Running time (seconds) of synthesis (mean + standard error) using binary 
search (B) and quantitative semantics (Q) running on CPU and GPU, with 25 steps 
of active learning. 


Ip CPU GPU 
B Q B Q 

A 8,460 + 1,517 | 3,343 + 202 737 + 36 |174 + 14 
B 3,511 + 549 2,291 + 237 428 +37 |110+9 
C 3,319 + 505 2,007 + 359 376 + 6 11349 
D 2,728 + 476 2,714 + 334 370 + 8 119+ 2 
E 1,264 + 176 599 + 54 225 + 3 50 +1 
F 1,689 + 360 748 + 81 285 + 7 60+ 1 
G 661 + 141 133 + 23 219 + 77 30 +1 
H 399 + 70 147 +9 185 + 94 32 + 17 
I 400 + 74 84 + 13 163 + 85 23 + 12 
J 544 + 120 173 +5 227 + 121| 36+ 19 
K 493 + 77 125 + 25 163 + 83 30 + 16 
L 732 + 47 286 + 73 252 + 133) 57+ 29 
M 697 + 40 253 + 49 245 +128 56+ 30 
N 5,691 + 272 977 + 176 1,393 + 590 | 264 + 136 
O 8,306 + 521 2,314 + 476 811 +12 |127+2 
P | 11,326 + 673 4,198 + 1,333 970 + 60 | 167+ 8 
Q | 12,430 + 962 2,915 + 508 1,141 + 101 | 183 + 11 


above 0.9 on all but 2 queries. Thus, QUIVR is able to synthesize accurate queries 
with relatively little user input. The neural networks achieve poor performance, 
particularly on the more difficult queries. 


5.3 Synthesis Running Time 


Next, we show that quantitative pruning and using a GPU each significantly 
reduce synthesis time, evaluating total running time for 25 steps of active learn- 
ing. 


Ablations. We compare to two ablations: (i) using the binary search approach 
of [53] to find pruning pairs, rather than using our quantitative semantics, and 
(ii) evaluating the matrix semantics (Appendix A.1) on a CPU rather than a 
GPU. 


Results. In Fig. 4, we report the running time of our algorithms on a CPU (2x 
AMD EPYC 7402 24-Core) and a GPU (1x NVIDIA RTX A6000). For binary 
search, on average, the GPU is 7.6x faster than the CPU. On a GPU, using the 
quantitative semantics rather than binary search offers another 5.0x speed-up. 


6 Related Work 


Monotonicity for Parameter Pruning. We build on [49] for our parameter pruning 
algorithm. Their approach has been applied to synthesizing STL formulas for 
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sequence classification by first enumerating sketches and then using monotonicity 
to find parameters, similar to our binary search baseline [53]. We replace binary 
search with our novel strategy based on quantitative semantics, leading to 5.0x 
speedup. There is also work building on [49] to create logically-relevant distance 
metrics between trajectories by taking the Hausdorff distance between parameter 
satisfaction regions (which they call “validity domains”), with applications to 
clustering [78]. For logics like STL, our quantitative semantics could provide a 
speedup to their approach. 


Synthesis of Temporal Logic Formulas. More broadly, there has been work syn- 
thesizing parameters in a variant of STL by discretizing the parameter space and 
then walking the satisfaction boundary [24]; in one dimension, their approach 
becomes binary search, inheriting its shortcomings. There has been work on syn- 
thesizing STL formulas that are satisfied by a closed-loop control model [38], but 
they assume the ability to find counterexample traces for incorrect STL formu- 
las, which is not applicable to our setting. Another approach is to synthesize 
parameters in STL formulas using gradient-based optimization [35] or stochastic 
optimization [45], but we found these methods to be ineffective in our setting, 
and they do not come with either soundness or completeness guarantees. There 
is work using decision trees to synthesize STL formulas [1,14,44,48], but these 
operate on a restricted subset of STL, namely Boolean combinations of a fixed 
set of template formulas. This restriction prevents these approaches from syn- 
thesizing temporal structure, which is a key component of the queries in our 
domains. Finally, there has been work on active learning of STL formulae using 
decision trees [48], but it assumes the ability to query for equivalence between 
a particular STL formula and the ground truth, which is not possible in our 
setting. 


Synthesizing Constants. There is work on synthesizing parameters of programs 
using counterexampled-guided inductive synthesis and different theory solvers, 
including Fourier-Motzkin variable elimination and an SMT solver [2]. Though 
our synthesis objective can be encoded in the theory of linear arithmetic, it is 
extremely large, and we have found such solvers to be ineffective in practice. 


Querying Video Data. There has been recent work on querying object detec- 
tions and trajectories in video data [5,7,8,28,40—42,54]. The main difference is 
our focus on synthesis; in addition, these approaches focus on SQL-like opera- 
tors such as select, inner-join, group-by, etc., over predefined predicates, which 
cannot capture compositions such as the sequencing and iteration operators in 
our language, which are necessary for identifying more complex behaviors. 


Neurosymbolic Models. There has been recent work on leveraging program syn- 
thesis in the context of machine learning. For instance, there has been work 
on using programs to represent high-level structure in images [21-23, 74,84], 
for reinforcement learning [9,34,79,80], and for querying websites [18]; in con- 
trast, we use programs to classify trajectories. The most closely related work is 
on synthesizing functional programs operating over lists [67,76]. Our language 
includes key constructs not included in their languages. Most importantly, we 
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include sequencing; in their functional language, such an operator would need 
to be represented as a nested series of if-then-else operators. In addition, their 
language does not support predicates that match subsequences; while such a 
predicate could be added, none of their operators can compose such predicates. 


Quantitative Synthesis. There has been work on program synthesis with quan- 
titative properties—e.g., on synthesis for producing optimized code [37,57,65], 
for approximate computing [15,52], for probabilistic programming [56], and for 
embedded control [17]. These approaches largely focus on search-based synthesis, 
either using constraint optimization [52], continuous optimization [17], enumera- 
tive search [15,57], or stochastic search [37,56,65]. While we leverage ideas from 
this literature, our quantitative semantics based pruning strategy is novel. 


Quantitative Semantics. Our quantitative semantics is similar to the “robustness 
degree” [25] of a temporal logic formula. The difference is that, by adjusting the 
denotations of the base predicates, our quantitative semantics gives a parameter 
on the satisfaction boundary. More broadly, there has been work on quantita- 
tive semantics for temporal logic for robust constraint satisfaction [20, 25,73], 
and to guide reinforcement learning [39]. There has been work on quantitative 
regular expressions (QREs) [4], though in general, QREs cannot be efficiently 
evaluated due to their nondeterminism, and our language is restricted to ensure 
efficient computation. There has been work on synthesizing QREs for network 
traffic classification [68], using binary search to compute decision thresholds. 
Similarly, there has been work using the Viterbi semiring to obtain quantita- 
tive semantics for Datalog programs [69], which they use in conjunction with 
gradient descent to learn the rules of the Datalog program. In contrast, we use 
our quantitative semantics to efficiently prune the parameter search space in a 
provably correct way. Finally, there has been work on using GPUs to evaluate 
regular expressions [55]; however, they focus on regular expressions over strings. 


Query Languages. Our language is closely related to both signal temporal logic 
(STL) [50] and Kleene algebras with tests (KAT) [46]. In particular, it can 
straightforwardly be extended to subsume both (see Appendix A for details), 
and our pruning strategy applies to this extended language. The addition of 
Kleene star, required to subsume KAT, worsens the evaluation time. STL has 
been used to monitor safety requirements for autonomous vehicles [32]. Spatio- 
Temporal Perception Logic (SPTL) is an extension of STL to support spatial 
reasoning [31]. Many of its operators are monotone, and thus would benefit 
from our algorithm. Scenic [27] is a DSL for creating static and dynamic driving 
scenes, but its focus is on generating scenes rather than querying for behaviors. 


7 Conclusion 


We have proposed a novel framework called QUIVR for synthesizing queries over 
video trajectory data. Our language is similar to KAT and STL, but supports 
conjunction and sequencing over multi-step predicates. Given only a few exam- 
ples, QUIVR efficiently synthesizes trajectory queries consistent with those exam- 
ples. A key contribution of our approach is the use of a quantitative semantics to 
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prune the parameter search space, yielding a 5.0x speedup over the state-of-the- 
art. In our evaluation, we demonstrate that QUIVR effectively synthesizes queries 
to identify interesting driving behaviors, and that our optimizations dramatically 
reduce synthesis time. 
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